Mark Johnston [Fri, 5 Aug 2022 20:25:05 +0000 (16:25 -0400)]
bpf: Fix BIOCPROMISC locking
BPF might put an interface in promiscuous mode when handling the
BIOCSDLT ioctl. When this happens, a flag is set in the BPF descriptor
so that the old interface can be restored when the BPF descriptor is
destroyed.
The BIOCPROMISC ioctl can also be used to put a BPF descriptor's
interface into promiscuous mode, but there was nothing synchronizing the
flag. Fix this by modifying the ioctl handler to acquire the global BPF
mutex, which is used to synchronize ifpromisc() calls elsewhere in BPF.
Reviewed by: kp, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36045
Mark Johnston [Fri, 5 Aug 2022 20:21:09 +0000 (16:21 -0400)]
arm: Clear TTBCR before enabling the MMU
Upon reset, this register is supposed to have a value of zero. But when
booting certain v7 CPUs in QEMU, we enter the kernel with several bits
set, including the EAE bit, which enables ARM's PAE extension. I'm not
sure if QEMU is setting it or if it's the uboot loader. Because FreeBSD
doesn't implement that extension and uses regular 32-bit page tables,
the kernel hangs immediately after enabling the MMU.
Just clear everything in TTBCR before enabling the MMU, to match the
reset value. FreeBSD doesn't toggle anything in that register.
PR: 251187
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36041
Mark Johnston [Fri, 5 Aug 2022 17:07:54 +0000 (13:07 -0400)]
makefs: Add ZFS support
This allows one to take a staged directory tree and create a file
consisting of a ZFS pool with one or more datasets that contain the
contents of the directory tree. This is useful for creating virtual
machine images without using the kernel to create a pool; "zpool create"
requires root privileges and currently is not permitted in jails.
makefs -t zfs also provides reproducible images by using a fixed seed
for pseudo-random number generation, used for generating GUIDs and hash
salts. makefs -t zfs requires relatively little by way of machine
resources.
The "zpool_reguid" rc.conf setting can be used to ask a FreeBSD guest to
generate a unique pool GUID upon first boot.
A small number of pool and dataset properties are supported. The pool
is backed by a single disk vdev. Data is always checksummed using
Fletcher-4, no redundant copies are made, and no compression is used.
The manual page documents supported pool and filesystem properties.
The implementation uses a few pieces of ZFS support from with the boot
loader, especially definitions for various on-disk structures, but is
otherwise standalone and in particular doesn't depend on OpenZFS.
This feature should be treated as experimental for now, i.e., important
data shouldn't be trusted to a makefs-created pool, and the command-line
interface is subject to change.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35248
Mark Adler [Sat, 30 Jul 2022 22:51:11 +0000 (15:51 -0700)]
zlib: Fix a bug when getting a gzip header extra field with inflate().
If the extra field was larger than the space the user provided with
inflateGetHeader(), and if multiple calls of inflate() delivered
the extra header data, then there could be a buffer overflow of the
provided space. This commit assures that provided space is not
exceeded.
Justin Hibbits [Fri, 5 Aug 2022 01:28:21 +0000 (21:28 -0400)]
powerpc/fsl_sata: Properly clamp maxio to pessimized size
The CAM 'maxio' is a 'pessimized' size, assuming 4k pages and one page
per segment. Since there are at most 63 segments in a transaction with
this driver, and one would necessarily be the indirect segment marker,
clamp the maxio to the minimum of maxphys (tunable) or (63 - 1) pages
(248k).
Kirk McKusick [Thu, 4 Aug 2022 23:06:43 +0000 (16:06 -0700)]
Drop checks with last alternate superblock in fsck_ffs(8).
The fsck_ffs(8) utility made sanity checks of critical superblock
fields by comparing the values of those fields in the standard
superblock againt the values of those fields in the last alternate
superblock. The code for validating a superblock now cover all the
checked fields as well as many more. Further the checks done are
far more comprehensive. So we now drop the alternate superblock
checks as they no longer provide value. Dropping these checks also
eliminates the need to read the alternate superblock.
Ed Maste [Thu, 4 Aug 2022 20:52:23 +0000 (16:52 -0400)]
libc: drop "All rights reserved" from Foundation copyrights
This has already been done for most files that have the Foundation as
the only listed copyright holder. Do it now for files that list
multiple copyright holders, but have the Foundation copyright in its own
section.
Steve Kargl [Thu, 4 Aug 2022 17:31:57 +0000 (19:31 +0200)]
[libm] Correct comments in s_cbrt[l].c
Damian McGuckin <damianm at esi dot com dot au> noted that the accuracy
claims in the code for cbrt(3) and cbrtl(3) were incorrect. Fix the
comments to more accurately describe the accuracies.
Warner Losh [Wed, 3 Aug 2022 16:50:14 +0000 (10:50 -0600)]
stand: use snprintf here
This code was written prior to snprintf being in the then libstand (now
libsa). Since we have it, use it for extra safety. The code already
tries to be safe, but since we have snprintf as well, the added layer of
protection will suffice. The current code reserves 16 bytes (plus a NUL)
at the end for worst case of inet_ntoa, which is still a little
pessimal, but safe from overflow.
Summary:
This allows installing packages that depend on kerberos libraries
without pulling in all the binaries. It also moves libgssapi to runtime
to allow installing kerbereos libraries without adding a dependancy on
the large utilities package. It makes sense to put libgssapi in runtime
rather than kerberos-lib since this is a plugin layer which is intended
to support any GSS-API mechanisms, not just kerberos.
A good example of a package which uses kerberos libraries without
needing the kerberos utilities is sshd. This uses the kerberos GSS-API
libraries to implement its GSSAPIAuthentication option.
Mark Johnston [Wed, 3 Aug 2022 00:32:17 +0000 (20:32 -0400)]
ctfconvert: Give bitfield types names distinct from the base type
CTF integers have an explicit width and so can be used to represent
bitfields. Bitfield types emitted by ctfconvert(1) share the name of
the base integer type, so a struct field with type "unsigned int : 15"
will have a type named "unsigned int".
To avoid ambiguity when looking up types by name, add a suffix to names
of bitfield types to distinguish them from the base type. Then, if
ctfmerge happens to order bitfield types before the corresponding base
type in a CTF file, a name lookup will return the base type, which is
always going to be the desired behaviour.
PR: 265403
Reported by: cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
ktrace: change AST handler to require AST flag set
When it was inline it made sense to depend on the existing nested check
in KTRUSERRET() rather than adding a new td_flags flag. However, since
we now have a TDA_KTRACE flag anyway, we might as well check it and
avoid the call.
Explicitly pass the struct thread argument.
Move the function prototype from sys/systm.h to geom/geom.h, we do not
need almost each kernel source to see the prototype, it is now used
only by kern/vfs_mountroot.c outside geom/geom_event.c, where the
function is defined.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.
amd64: expicitly re-init td_frame in copy_thread()
Otherwise we are using whatever the value was left from the previous
thread run on kernel entry from usermode. Typically it would be the
desired value as is, but it is not guaranteed.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888
Doug Moore [Tue, 2 Aug 2022 16:19:46 +0000 (11:19 -0500)]
rb_tree: update augmentation after element change
For an augmented rb_tree, allow a faster alternative to removing an
element from the tree, tweaking it slightly, and inserting it back
into the tree, knowing that its relative position in the tree is
unchanged. Instead, just change the element and invoke
RB_UPDATE_AUGMENT to fix the augmentation data for all the nodes in
the tree.
Mike Karels [Sun, 31 Jul 2022 18:57:03 +0000 (13:57 -0500)]
tcp.4: Add missing sysctls, other fixes
Add some of the missing sysctls to tcp.4, using references to other
man pages where they exist. Added sysctls include recvbuf and sendbuf
controls for automatic buffer sizing. Updated recvspace and sendspace.
Add sysctl.8 to "see also" and intro to variable section. Rename
"MIB Variables" section to "MIB (sysctl) Variables", as most people
will associate with sysctl.
Mike Karels [Thu, 21 Jul 2022 13:17:03 +0000 (08:17 -0500)]
inet.4 icmp.4 udp.4: Add missing sysctls, other fixes
Add missing sysctls to inet.4 and icmp.4, using references to ip.4
for variables and groups documented there. Add sysctl.8 to "see also"
and intro to variable section. Rename "MIB Variables" section to
"MIB (sysctl) Variables", as most people will associate with sysctl.
Revise history: the ICMP implementation was in 4.2BSD.
Mike Karels [Fri, 29 Jul 2022 14:23:23 +0000 (09:23 -0500)]
IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6
socket using in6_pcblookup_local(), and if the socket can also connect
to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well.
Otherwise, we can allocate a port that is used by an IPv4 socket
(possibly one created from IPv6 via the same procedure), and then
connect() can fail with EADDRINUSE, when it could have succeeded if
the bound port was not in use.
En-Wei Wu [Mon, 1 Aug 2022 19:40:13 +0000 (19:40 +0000)]
wtap(4): Fix bug in wtap_node_write() and wtap_vap_create()
Originally, wtap_node_write() gets the wrong softc by iterating V_inet and
gets the ifp by string comparison, then gets softc by ifp->if_softc.
However, ifp->if_softc will not point to the correct softc owned by
ieee80211com, and thus causes a kernel panic.
Fix it by assigning softc to cdev's si_drv1 in wtap_vap_create()
and get the softc directly via dev->si_drv1 in wtap_node_write().
The cdev created by wtap_vap_create() use the name of ieee80211com
rather than the vap's name. It will cause the second vap based on
the same ieee80211com as first vap fail to create a device node
because the device node is already exists. Fix it by assigning
vap->iv_ifp->if_xname to cdev's name.
routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
group+weight.
tests: fix unix_passfd_dgram:rights_creds_payload after be1f485d7d6b
The test was failing due to the assert on lack of MSG_TRUNC flag in the
output flags of recvmsg().
The code passed MSG_TRUNC, along with sufficient-size buffer to hold the
message to-be-received to the recvmsg(), and expected MSG_TRUNC to be
returned as well.
This is not exactly correct as a) MSG_TRUNC was not even a supported
recvmsg() flag before be1f485d7d6b and b) it violates POSIX, as
POSIX states it should be set only "If a message is too long to fit in
the supplied buffers,".
The test was working before as the kernel copied input flags to the
output flags. be1f485d7d6b changed that behaviour to clear MSG_TRUNC
if it was present on the input.
routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving
state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
is problematic to maintain consistency, resulting in repeated checks
inside many functions. Additionally, there are multiple ways of
specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
required state in the nexthops in the consistent fashion, reducing the
need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
format, which is different from other kernel routing users or netlink.
This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.
routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
and netlink introduction.
Kirk McKusick [Mon, 1 Aug 2022 05:07:20 +0000 (22:07 -0700)]
Identify each UFS/FFS superblock integrity check as a warning or fatal error.
Identify each of the superblock validation checks as either a
warning or a fatal error. Any integrity check that can cause a
system hang or crash is marked as fatal. Those that may simply
lead to poor file layoutor other less good operating conditions
are marked as warning.
Normally both fatal and warning are treated as errors and prevent
the superblock from being loaded. A new flag, UFS_NOWARNFAIL, is
added. When passed to ffs_sbget() it will note warnings that it
finds, but will still proceed with loading the superblock. Note
that when UFS_NOWARNFAIL is used, it also includes UFS_NOHASHFAIL.
No legitimate superblocks should fail as a result of these changes.
Kirk McKusick [Mon, 1 Aug 2022 03:28:04 +0000 (20:28 -0700)]
Updates to UFS/FFS superblock integrity checks when reading a superblock.
Further updates based on analysis of the way the fields are used
in the various filesystem macros defined in fs.h.
Eliminate several checks for non-negative values where the fields
are checked for specific values. Since these specific values are
non-negative, if the value is a verified positive value then it
cannot be negative and such a check is redundant and unnecessary.
No legitimate superblocks should fail as a result of these changes.
Add a flags parameter to the ffs_sbget() function that reads UFS superblocks.
Rather than trying to shoehorn flags into the requested superblock
address, create a separate flags parameter to the ffs_sbget()
function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is
used both in the kernel and in user-level utilities through export
to the sbget() function in the libufs(3) library (see sbget(3)
for details). The kernel uses ffs_sbget() when mounting UFS
filesystems, in the glabel(8) and gjournal(8) GEOM utilities,
and in the standalone library used when booting the system
from a UFS root filesystem.
The ffs_sbget() function reads the superblock located at the byte
offset specified by its sblockloc parameter. The value UFS_STDSB
may be specified for sblockloc to request that the standard
location for the superblock be read.
The two existing options are now flags:
UFS_NOHASHFAIL will note if the check hash is wrong but will still
return the superblock. This is used by the bootstrap code to
give the system a chance to come up so that fsck can be run to
correct the problem.
UFS_NOMSG indicates that superblock inconsistency error messages
should not be printed. It is used by programs like fsck that
want to print their own error message and programs like glabel(8)
that just want to know if a UFS filesystem exists on a partition.
One additional flag is added:
UFS_NOCSUM causes only the superblock itself to be returned, but does
not read in any auxiliary data structures like the cylinder group
summary information. It is used by clients like glabel(8) that
just want to check for possible filesystem types. Using UFS_NOCSUM
skips the superblock checks for csum data which allows superblocks
that have corrupted csum data to be read and used.
The validate_sblock() function checks that the superblock has not
been corrupted in a way that can crash or hang the system. Unless
the UFS_NOMSG flag is specified, it will print out any errors that
it finds. Prior to this commit, validate_sblock() returned as soon
as it found an inconsistency so would print at most one message.
It now does all its checks so when UFS_NOMSG has not been specified
will print out everything that it finds inconsistent.
By convention, kernel threads must call kthread_exit() instead of
blindly returning from the thread function. We have some safety measure
in fork_exit(), which checks for the P_KPROC p_flag and does
kthread_exit() for kernel thread that forgot to do it itself.
But this workaround only works for kernel threads belonging to the
kernel process. If a kernel thread is attached to the normal process
with live userspace, and does not call kthread_exit(), then the
workaround is not activated, and for amd64 at least, the return from the
thread function/fork_exit() results in the return to userspace with the
copy of frame from the thread that did kthread_add().
Practically for smrstress, this destroys the user stack of the still
active frame in the other thread, which was the caller of kthread_add().
Fix it by adding kthread_exit() to the thread function.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35999
Alan Cox [Fri, 29 Jul 2022 06:14:46 +0000 (01:14 -0500)]
iommu_gas: Eliminate redundant parameters and push down lock acquisition
Since IOMMU map entries store a reference to the domain in which they
reside, there is no need to pass the domain to iommu_gas_free_entry(),
iommu_gas_free_space(), and iommu_gas_free_region().
Push down the acquisition and release of the IOMMU domain lock into
iommu_gas_free_space() and iommu_gas_free_region().
Both of these changes allow for simplifications in the callers of the
functions without really complicating the functions themselves.
Moreover, the latter change eliminates the direct use of the IOMMU
domain lock from the x86-specific DMAR code.
sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg().
Implement Linux-variant of MSG_TRUNC input flag used in recv(), recvfrom() and recvmsg().
Posix defines MSG_TRUNC as an output flag, indicating packet/datagram truncation.
Linux extended it a while (~15+ years) ago to act as input flag,
resulting in returning the full packet size regarless of the input
buffer size.
It's a (relatively) popular pattern to do recvmsg( MSG_PEEK | MSG_TRUNC) to get the
packet size, allocate the buffer and issue another call to fetch the packet.
In particular, it's popular in userland netlink code, which is the primary driving factor of this change.
This commit implements the MSG_TRUNC support for SOCK_DGRAM sockets (udp, unix and all soreceive_generic() users).
Warner Losh [Sat, 30 Jul 2022 11:01:47 +0000 (05:01 -0600)]
stand: Add a helper 'universe' target
Add a shortcut for invokging ${SRCTOP}/tools/boot/universe.sh by
creating a 'universe' target in src/stand. This will make it easier to
test out all the different combinations of boot loaders that we build.
Warner Losh [Sat, 30 Jul 2022 10:43:21 +0000 (04:43 -0600)]
stand: Move quit command to common commands
Since both EFI and the future kboot will benefit from a 'quit' command,
move it from efi/loader/main.c to common/commands.c. In EFI this command
exits back to the boot loader (which will cause the next BootXXXX in the
BootOrder list to be attempted). In kboot, this will exit back to
whatever called loader.kboot. In uboot this will cause a reset (which
will restart uboot, not quite a simple exit, but will look similar)
and in OFW it will execute OF_exit which should return to the
openfirmware prompt.