Mike Karels [Fri, 29 Jul 2022 14:23:23 +0000 (09:23 -0500)]
IPv6: fix problem with duplicate port assignment with v4-mapped addrs
In in_pcb_lport_dest(), if an IPv6 socket does not match any other IPv6
socket using in6_pcblookup_local(), and if the socket can also connect
to IPv4 (the INP_IPV4 vflag is set), check for IPv4 matches as well.
Otherwise, we can allocate a port that is used by an IPv4 socket
(possibly one created from IPv6 via the same procedure), and then
connect() can fail with EADDRINUSE, when it could have succeeded if
the bound port was not in use.
En-Wei Wu [Mon, 1 Aug 2022 19:40:13 +0000 (19:40 +0000)]
wtap(4): Fix bug in wtap_node_write() and wtap_vap_create()
Originally, wtap_node_write() gets the wrong softc by iterating V_inet and
gets the ifp by string comparison, then gets softc by ifp->if_softc.
However, ifp->if_softc will not point to the correct softc owned by
ieee80211com, and thus causes a kernel panic.
Fix it by assigning softc to cdev's si_drv1 in wtap_vap_create()
and get the softc directly via dev->si_drv1 in wtap_node_write().
The cdev created by wtap_vap_create() use the name of ieee80211com
rather than the vap's name. It will cause the second vap based on
the same ieee80211com as first vap fail to create a device node
because the device node is already exists. Fix it by assigning
vap->iv_ifp->if_xname to cdev's name.
routing: refactor private KPI
* Make nhgrp_get_nhops() return const struct weightened_nhop to
indicate that the list is immutable
* Make nhgrp_get_group() return the actual group, instead of
group+weight.
tests: fix unix_passfd_dgram:rights_creds_payload after be1f485d7d6b
The test was failing due to the assert on lack of MSG_TRUNC flag in the
output flags of recvmsg().
The code passed MSG_TRUNC, along with sufficient-size buffer to hold the
message to-be-received to the recvmsg(), and expected MSG_TRUNC to be
returned as well.
This is not exactly correct as a) MSG_TRUNC was not even a supported
recvmsg() flag before be1f485d7d6b and b) it violates POSIX, as
POSIX states it should be set only "If a message is too long to fit in
the supplied buffers,".
The test was working before as the kernel copied input flags to the
output flags. be1f485d7d6b changed that behaviour to clear MSG_TRUNC
if it was present on the input.
routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving
state between various functions inside the routing subsystem.
As all of the fields are filled in directly by the customers, it
is problematic to maintain consistency, resulting in repeated checks
inside many functions. Additionally, there are multiple ways of
specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on.
With the upcoming nhop(9) kpi it is possible to store all of the
required state in the nexthops in the consistent fashion, reducing the
need to use "info" in the KPI calls.
Finally, rt_addrinfo structure format was derived from the rtsock wire
format, which is different from other kernel routing users or netlink.
This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.
routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property.
Move its storage to nexthop to simplify upcoming nhop(9) KPI changes
and netlink introduction.
Kirk McKusick [Mon, 1 Aug 2022 05:07:20 +0000 (22:07 -0700)]
Identify each UFS/FFS superblock integrity check as a warning or fatal error.
Identify each of the superblock validation checks as either a
warning or a fatal error. Any integrity check that can cause a
system hang or crash is marked as fatal. Those that may simply
lead to poor file layoutor other less good operating conditions
are marked as warning.
Normally both fatal and warning are treated as errors and prevent
the superblock from being loaded. A new flag, UFS_NOWARNFAIL, is
added. When passed to ffs_sbget() it will note warnings that it
finds, but will still proceed with loading the superblock. Note
that when UFS_NOWARNFAIL is used, it also includes UFS_NOHASHFAIL.
No legitimate superblocks should fail as a result of these changes.
Kirk McKusick [Mon, 1 Aug 2022 03:28:04 +0000 (20:28 -0700)]
Updates to UFS/FFS superblock integrity checks when reading a superblock.
Further updates based on analysis of the way the fields are used
in the various filesystem macros defined in fs.h.
Eliminate several checks for non-negative values where the fields
are checked for specific values. Since these specific values are
non-negative, if the value is a verified positive value then it
cannot be negative and such a check is redundant and unnecessary.
No legitimate superblocks should fail as a result of these changes.
Add a flags parameter to the ffs_sbget() function that reads UFS superblocks.
Rather than trying to shoehorn flags into the requested superblock
address, create a separate flags parameter to the ffs_sbget()
function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is
used both in the kernel and in user-level utilities through export
to the sbget() function in the libufs(3) library (see sbget(3)
for details). The kernel uses ffs_sbget() when mounting UFS
filesystems, in the glabel(8) and gjournal(8) GEOM utilities,
and in the standalone library used when booting the system
from a UFS root filesystem.
The ffs_sbget() function reads the superblock located at the byte
offset specified by its sblockloc parameter. The value UFS_STDSB
may be specified for sblockloc to request that the standard
location for the superblock be read.
The two existing options are now flags:
UFS_NOHASHFAIL will note if the check hash is wrong but will still
return the superblock. This is used by the bootstrap code to
give the system a chance to come up so that fsck can be run to
correct the problem.
UFS_NOMSG indicates that superblock inconsistency error messages
should not be printed. It is used by programs like fsck that
want to print their own error message and programs like glabel(8)
that just want to know if a UFS filesystem exists on a partition.
One additional flag is added:
UFS_NOCSUM causes only the superblock itself to be returned, but does
not read in any auxiliary data structures like the cylinder group
summary information. It is used by clients like glabel(8) that
just want to check for possible filesystem types. Using UFS_NOCSUM
skips the superblock checks for csum data which allows superblocks
that have corrupted csum data to be read and used.
The validate_sblock() function checks that the superblock has not
been corrupted in a way that can crash or hang the system. Unless
the UFS_NOMSG flag is specified, it will print out any errors that
it finds. Prior to this commit, validate_sblock() returned as soon
as it found an inconsistency so would print at most one message.
It now does all its checks so when UFS_NOMSG has not been specified
will print out everything that it finds inconsistent.
By convention, kernel threads must call kthread_exit() instead of
blindly returning from the thread function. We have some safety measure
in fork_exit(), which checks for the P_KPROC p_flag and does
kthread_exit() for kernel thread that forgot to do it itself.
But this workaround only works for kernel threads belonging to the
kernel process. If a kernel thread is attached to the normal process
with live userspace, and does not call kthread_exit(), then the
workaround is not activated, and for amd64 at least, the return from the
thread function/fork_exit() results in the return to userspace with the
copy of frame from the thread that did kthread_add().
Practically for smrstress, this destroys the user stack of the still
active frame in the other thread, which was the caller of kthread_add().
Fix it by adding kthread_exit() to the thread function.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35999
Alan Cox [Fri, 29 Jul 2022 06:14:46 +0000 (01:14 -0500)]
iommu_gas: Eliminate redundant parameters and push down lock acquisition
Since IOMMU map entries store a reference to the domain in which they
reside, there is no need to pass the domain to iommu_gas_free_entry(),
iommu_gas_free_space(), and iommu_gas_free_region().
Push down the acquisition and release of the IOMMU domain lock into
iommu_gas_free_space() and iommu_gas_free_region().
Both of these changes allow for simplifications in the callers of the
functions without really complicating the functions themselves.
Moreover, the latter change eliminates the direct use of the IOMMU
domain lock from the x86-specific DMAR code.
sockets: add MSG_TRUNC flag handling for recvfrom()/recvmsg().
Implement Linux-variant of MSG_TRUNC input flag used in recv(), recvfrom() and recvmsg().
Posix defines MSG_TRUNC as an output flag, indicating packet/datagram truncation.
Linux extended it a while (~15+ years) ago to act as input flag,
resulting in returning the full packet size regarless of the input
buffer size.
It's a (relatively) popular pattern to do recvmsg( MSG_PEEK | MSG_TRUNC) to get the
packet size, allocate the buffer and issue another call to fetch the packet.
In particular, it's popular in userland netlink code, which is the primary driving factor of this change.
This commit implements the MSG_TRUNC support for SOCK_DGRAM sockets (udp, unix and all soreceive_generic() users).
Warner Losh [Sat, 30 Jul 2022 11:01:47 +0000 (05:01 -0600)]
stand: Add a helper 'universe' target
Add a shortcut for invokging ${SRCTOP}/tools/boot/universe.sh by
creating a 'universe' target in src/stand. This will make it easier to
test out all the different combinations of boot loaders that we build.
Warner Losh [Sat, 30 Jul 2022 10:43:21 +0000 (04:43 -0600)]
stand: Move quit command to common commands
Since both EFI and the future kboot will benefit from a 'quit' command,
move it from efi/loader/main.c to common/commands.c. In EFI this command
exits back to the boot loader (which will cause the next BootXXXX in the
BootOrder list to be attempted). In kboot, this will exit back to
whatever called loader.kboot. In uboot this will cause a reset (which
will restart uboot, not quite a simple exit, but will look similar)
and in OFW it will execute OF_exit which should return to the
openfirmware prompt.
Sometimes we need to reset a PCIe bus, but sometimes it breaks the
downstream device(s). Since, from my testing, this is only needed for
Radeon cards installed in the AmigaOne machines because the card was
already initialized by firmware, make the reset dependent on a device
hint (hint.pcib.X.reset=1). With this, AmigaOne X5000 machines can have
other devices in the secondary PCIe slots.
Fix for 90e2971 that caused some geli commands to return the wrong exit status.
The reported problem is that some geli commands exit with a
success status when they should exit with a failed status.
The gctl_error() function is defined differently in the kernel
(in sys/geom/geom_ctl.c) versus in the geom user facilities (in
sbin/geom/misc/subr.c). In the kernel, calling gctl_error() causes
an error return to be set while in the user version it does not.
It was only by a quirk that had been added to the user geom return
processing that I "cleaned up" that the lack of the user implementaion
to set the error return showed up.
This patch adds the missing setting of the error code when calling
the user facility gctl_error().
While working on new and updates to drivers more structs, fields,
functions, .. were found, had to be shuffled around, ..
Some of these are (so far still dummy) functions or not properly
typed fields. The IEEE80211_HE_ constants are all still dummy.
This was msotly as a start to make new (out-of-tree) things compile.
Sponsored by: The FreeBSD Foundation (minor VHT/chan width bits)
MFC after: 1 week
LinuxKPI: skbuff: sort list header and add new (dummy) functions
While working on new and updates to drivers more skbuff changes
came up. Sort out the list/prev/next header problem and add more
(so far dummy) functions needed.
net80211: VHT correct check/option in ieee80211_vht_adjust_channel()
In ieee80211_vht_adjust_channel() we have to check for all possible
IEEE80211_FVHT_VHT* options using the mask rather than just checking
for IEEE80211_FVHT_VHT; ieee80211_vhtchanflags() (contrary to its
HT counterpart) only returns the "highest" flag nor or-ing them together
with the base flag. For the moment this seems to make sense as with
more width options we'd add a pyramid.
Later on, in the same function when we get VHT160 actually go and look
for VHT160 and not VHT80.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D35977
This is similar to 5d48fb3b16c1496bf415fee620c61cc944b0326d.
With LLVM14 the .data.rel.ro ELF section appears after .data,
making loader behave erractly and kernel is not loaded.
This patch makes ensures the correct order.
Based on discussion at:
https://github.com/llvm/llvm-project/issues/56306
MFC after: 1 day
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
John Baldwin [Thu, 28 Jul 2022 22:55:10 +0000 (15:55 -0700)]
pmap_mapdev: Consistently use vm_paddr_t for the first argument.
The devmap variants used vm_offset_t for some reason, and a few places
explicitly cast bus addresses to vm_offset_t. (Probably those casts
along with similar casts for vm_size_t should just be removed and
instead permit the compiler to DTRT.)
Warner Losh [Tue, 26 Jul 2022 23:39:45 +0000 (17:39 -0600)]
kboot: Make console raw when we start
Put the console into raw mode on startup. This allows the menus to work
as expected. Boot is now interruptable.
Note: Likely should restore the terminal settings on most exists. It's
not clear the best way to do this, and most shells have an auto stty
sane anyway, so note it for future improvement.
Warner Losh [Tue, 26 Jul 2022 23:31:23 +0000 (17:31 -0600)]
kboot: implement stripped down termios
Implement a stripped down termios, obtained from various files in musl
and HOST_ or host_ prepended to most things and a few unavoidable style
tweaks. Only implements the bits of termios we need for the boot loader:
put the terminal into raw mode, restore terminal settings and speed
stuff.
Warner Losh [Thu, 28 Jul 2022 21:18:08 +0000 (15:18 -0600)]
kboot: Add host_exit and use it to implement exit()
Clients of libsa are expected to implement exit(). The current exit just
loops forever. It is better to really exit: when running as init that
will reboot the system. When not running as init, other programs can
recover (not that we support running as init, but when we do in the
future, this is still the rigtht thing).
Warner Losh [Mon, 11 Jul 2022 23:49:11 +0000 (17:49 -0600)]
kboot: aarch64 support
Add support for aarch64. exec.c and ldscript are copied from the EFI
version with #ifdefs for the differences. Once complete, I'll refactor
them. host_syscall.S implements a generic system call. tramp.S is a
first attempt to create a tramoline that we can use to jump to the
aarch64 kernel. Add aarch64-specific startup and stat files as well.
exec.c tweaked slightly to avoid bringing in bi_load(), which will come
in later. Includes tweaks to stat due to name differences between names
on different Linux architectures.
James Skon [Thu, 28 Jul 2022 19:58:31 +0000 (21:58 +0200)]
altq: improve pfctl config time for large numbers of queues
In the current implementation of altq_hfsc.c, whne new queues are being
added (by pfctl), each queue is added to the tail of the siblings linked
list under the parent queue.
On a system with many queues (50,000+) this leads to very long load
times at the insertion process must scan the entire list for every new
queue,
Since this list is unordered, this changes merely adds the new queue to
the head of the list rather than the tail.
Mark Johnston [Thu, 28 Jul 2022 13:38:52 +0000 (09:38 -0400)]
riscv: Avoid passing invalid addresses to pmap_fault()
After the addition of SV48 support, VIRT_IS_VALID() did not exclude
addresses that are in the SV39 address space hole but not in the SV48
address space hole. This can result in mishandling of accesses to that
range when in SV39 mode.
Fix the problem by modifying VIRT_IS_VALID() to use the runtime address
space bounds. Then, if the address is invalid, and pcb_onfault is set,
give vm_fault_trap() a chance to veto the access instead of panicking.
PR: 265439
Reviewed by: jhb
Reported and tested by: Robert Morris <rtm@lcs.mit.edu>
Fixes: 31218f3209ac ("riscv: Add support for enabling SV48 mode")
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35952
The use of 'package' in this could be understood to mean a FreeBSD
package provided by pkg, rather than the fact that we use data provided
by IANA. Re-word it to clearly identify `tzdata` as the IANA Time Zone
Database on first use, then drop subsequent uses of the word 'package'.
Reviewed by: 0mp, pauamma, philip
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35966
Andrew Gallatin [Thu, 28 Jul 2022 14:36:22 +0000 (10:36 -0400)]
lagg: fix lagg ifioctl after SIOCSIFCAPNV
Lagg was broken by SIOCSIFCAPNV when all underlying devices
support SIOCSIFCAPNV. This change updates lagg to work with
SIOCSIFCAPNV and if_capabilities2.
Warner Losh [Thu, 28 Jul 2022 13:58:32 +0000 (07:58 -0600)]
Style(9): Strengthen statements about not using K&R function definitions
K&R function definitions will soon be obsolete. Work has been underway
to remove all K&R function definitions from the tree for a while now. A
future C version will remove this construct from the language. So
strengthen existing statements about K&R function definitions and
declarations.
While here, remove __P macro reference. It's not been in active use for
almost two decades apart from legacy contrib code.
Warner Losh [Thu, 28 Jul 2022 05:11:12 +0000 (23:11 -0600)]
kboot: Remove RELOC defines, it's unused
This was copied from powerpc/ofw and has never been used. We also don't
care about -DAIM. It's only relevant for in-kernel structures, which we
don't use in this userland program.
Adjust function definition in isa's pnp.c to avoid clang 15 warning
With clang 15, the following -Werror warning is produced:
sys/isa/pnp.c:118:24: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
pnp_send_initiation_key()
^
void
This is because pnp_send_initiation_key() is declared with a (void)
argument list, but defined with an empty argument list. Make the
definition match the declaration.
Suppress possible unused variable warning for icl_soft.c
With clang 15, the following -Werror warning is produced on i386:
sys/dev/iscsi//icl_soft.c:1277:6: error: variable 'i' set but not used [-Werror,-Wunused-but-set-variable]
int i;
^
The 'i' variable is used later in the icl_soft_conn_pdu_get_bio()
function, via the PHYS_TO_DMAP() macro. However, on i386 and some other
architectures, this macro is defined to panic immediately, so in those
cases, 'i' is indeed not used. Suppress the warning by marking 'i' as
unused.
- Clear CR2, EFER, and R8-15 to zero.
- Reset DR6 and DR7 to their documented reset values.
- Reset interrupt shadow state.
- Document the reason CR0 is reset to a value that doesn't match its
documented value.
On physical systems the ram isn't initialized on boot. So, coreboot uses
the cache as ram in this boot phase. When exiting cache as ram, coreboot
calls INVD for making the cache consistent.
In a virtual environment ram is always initialized and the cache is
always consistent. So, we can safely ignore this call.
QAT in-tree driver ported from out-of-tree release available
from 01.org.
The driver exposes complete cryptography and data compression
API in the kernel and integrates with Open Crypto Framework.
Details of supported operations, devices and usage can be found
in man and on 01.org.
Patch co-authored by: Krzysztof Zdziarski <krzysztofx.zdziarski@intel.com>
Patch co-authored by: Michal Jaraczewski <michalx.jaraczewski@intel.com>
Patch co-authored by: Michal Gulbicki <michalx.gulbicki@intel.com>
Patch co-authored by: Julian Grajkowski <julianx.grajkowski@intel.com>
Patch co-authored by: Piotr Kasierski <piotrx.kasierski@intel.com>
Patch co-authored by: Adam Czupryna <adamx.czupryna@intel.com>
Patch co-authored by: Konrad Zelazny <konradx.zelazny@intel.com>
Patch co-authored by: Katarzyna Rucinska <katarzynax.kargol@intel.com>
Patch co-authored by: Lukasz Kolodzinski <lukaszx.kolodzinski@intel.com>
Patch co-authored by: Zbigniew Jedlinski <zbigniewx.jedlinski@intel.com>
Mark Johnston [Wed, 27 Jul 2022 14:55:40 +0000 (10:55 -0400)]
qat: Rename to qat_c2xxx and remove support for modern chipsets
A replacement QAT driver will be imported, but this replacement does not
support Atom C2xxx hardware. So, the existing driver will be kept
around to provide opencrypto offload support for those chipsets.
Reviewed by: pauamma, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35817