Mike Karels [Sat, 27 Jan 2024 15:40:07 +0000 (09:40 -0600)]
inet(3): clarify syntax accepted by inet_pton
The section INTERNET ADDRESSES describes the acceptance of dotted
values with varying number of parts in multiple bases. This applies
to inet_aton and inet_addr, but not to inet_pton. Clarify this
section by listing the functions to which this applies. Move the
description of what inet_pton accepts into this section from STANDARDS,
where it is easily missed. Rename the section to clarify that it
applies only to IPv4. (inet_pton also works with IPv6.)
Xin LI [Sat, 27 Jan 2024 03:09:39 +0000 (19:09 -0800)]
releng-gce: Advertise the availability of UEFI support in GCE images.
The amd64 and arm64 images supported UEFI, mark it as so users can take
advantage of UEFI boot on GCE. This is already done on FreeBSD
14.0-RELEASE but never codified into the release tools (and should).
Ed Maste [Thu, 25 Jan 2024 01:47:36 +0000 (20:47 -0500)]
makefs: warn that ffs sectorsize other than 512 may not work
newfs always sets sectorsize to DEV_BSIZE (512) and derives some other
values based on the number of 512-byte sectors per real sector. Similar
logic is required in makefs. Until that happens, emit a warning that
the image may be incorrect.
Olivier Certner [Thu, 18 Jan 2024 13:10:18 +0000 (14:10 +0100)]
SCHEDULER_STOPPED(): Rely on a global variable
A commit from 2012 (5d7380f8e34f0083, r228424) introduced
'td_stopsched', on the ground that a global variable would cause all
CPUs to have a copy of it in their cache, and consequently of all other
variables sharing the same cache line.
This is really a problem only if that cache line sees relatively
frequent modifications. This was unlikely to be the case back then
because nearby variables are almost never modified as well. In any
case, today we have a new tool at our disposal to ensure that this
variable goes into a read-mostly section containing frequently-accessed
variables ('__read_frequently'). Most of the cache lines covering this
section are likely to always be in every CPU cache. This makes the
second reason stated in the commit message (ensuring the field is in the
same cache line as some lock-related fields, since these are accessed in
close proximity) moot, as well as the second order effect of requiring
an additional line to be present in the cache (the one containing the
new 'scheduler_stopped' boolean, see below).
From a pure logical point of view, whether the scheduler is stopped is
a global state and is certainly not a per-thread quality.
Consequently, remove 'td_stopsched', which immediately frees a byte in
'struct thread'. Currently, the latter's size (and layout) stays
unchanged, but some of the later re-orderings will probably benefit from
this removal. Available bytes at the original position for
'td_stopsched' have been made explicit with the addition of the
'_td_pad0' member.
Store the global state in the new 'scheduler_stopped' boolean, which is
annotated with '__read_frequently'.
Replace uses of SCHEDULER_STOPPED_TD() with SCHEDULER_STOPPER() and
remove the former as it is now unnecessary.
Reviewed by: markj, kib
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43572
Olivier Certner [Thu, 18 Jan 2024 10:31:59 +0000 (11:31 +0100)]
SCHEDULER_STOPPED(): Move it (back) to 'systm.h'
It's not an assertion, so doesn't logically belong to 'kassert.h'.
Moreover, a subsequent commit will make it rely on a variable whose
declaration also belongs to 'systm.h'.
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43571
Olivier Certner [Thu, 18 Jan 2024 10:15:18 +0000 (11:15 +0100)]
panic()/KERNEL_PANICKED(): Move back to using 'panicstr' as a flag
Currently, no performance-critical path tests for a panic. Moreover, we
today have KERNEL_PANICKED() which wraps the test into
__predict_false(), already catering to those (potential) use cases.
Also, in practice we don't support 64-bit architectures without caches,
so reading an 'int' instead of a pointer doesn't (directly) save any
memory access. Finally, 'panicked' is redundant with 'panicstr' (and
wastes a tiny amount of memory).
Consequently:
1. Use again 'panicstr' as a flag indicating that the system is
panicking. To this end:
- Modify panic() so that it ensures this pointer is set to some
non-NULL value even if the caller didn't pass any panic string.
- Modify KERNEL_PANICKED() to test for 'panicstr'.
- Remove 'panicked'.
2. Annotate 'panicstr' with '__read_mostly' (instead of using
'__read_frequently' as for 'panicked'). This may have to be changed if,
in the future, some performance-intensive path needs to test it.
3. Convert a few more direct tests of 'panicstr' to using
KERNEL_PANICKED().
Mark Johnston [Fri, 26 Jan 2024 15:35:40 +0000 (10:35 -0500)]
arm64: Remove pmap_san_bootstrap() and call kasan_init_early() directly
pmap_san_bootstrap() doesn't really do much, and it was hard-coding the
the bootstrap stack size defined in locore.S. Moreover, the name is a
bit confusing given the existence of pmap_bootstrap_san(). Just remove
it and call kasan_init_early() directly like we do on amd64. It will
not be used by KMSAN in a forthcoming patch series.
No functional change intended.
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43403
Ed Maste [Fri, 26 Jan 2024 15:19:04 +0000 (10:19 -0500)]
open: make non-POSIX errno value more apparent
In the errno list, add an explicit note and reference to the note in the
STANDARDS section.
When O_NOFOLLOW is specified and the target is a symbolic link FreeBSD
sets errno to a value different than that specified by POSIX. Commit 295159dfa3ed added a note to this effect, but I missed it when reading
through the list of errno values.
PR: 214633
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43618
tcp: commonize check for more data to send, style changes
Use SEQ_SUB instead of a plain subtraction, for an implict
type conversion and prevention of a possible overflow.
Use curly brackets in stacked if statements throughout.
Use of the ? operator to enhance readability when clearing
the FIN flag in tcp_output().
Jessica Clarke [Fri, 26 Jan 2024 00:19:02 +0000 (00:19 +0000)]
ldscript.powerpc*: Only put .dynamic in PT_DYNAMIC
Currently there are a few output sections left as implicitly using
:kernel :dynamic before :kernel on its own is used again, which means
they end up in both the PT_LOAD and the PT_DYNAMIC segments, an unusual
situation which the new libelf-based kldxref initially treated as
invalid. Thus, hoist the :kernel to the very next section to ensure only
.dynamic is in PT_DYNAMIC, as is more normal.
Whilst here, sync ldscript.powerpc64le with ldscript.powerpc64 to pick
up various fixes that were presumably made between the start of the
powerpc64le port and it being committed and got missed.
Account for SACK retransmitted bytes once the actual length
is known. This prevents a call to tcp_maxseg() and prepares
for TSO support when transmitting from the SACK scoreboard.
Mark Johnston [Thu, 25 Jan 2024 21:33:46 +0000 (16:33 -0500)]
arm64: Add a VM_FREELIST_DMA32 freelist
When booting a KMSAN kernel on an Ampere Altra, I've seen some boot time
hangs when the XHCI controller driver attempts to allocate memory for
32-bit DMA. The system boots fine with a GENERIC kernel; I believe that
the additional memory requirements of KMSAN push it over the edge. The
system has a bit less than 2GB of RAM below the 4GB boundary.
Allocate a new freelist to segregate memory below 4GB, as we do on
amd64, so that such memory allocation failures are less likely to occur.
Reviewed by: alc
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D43503
Interesting fixes (* were already cherry-picked):
- 03c83f5 add __cxa_init_primary_exception (#23)
* 5d8a158 Fix two bugs in __cxa_end_cleanup()
* b00c6c5 Insert padding in __cxa_dependent_exception
* 45ca8b1 Insert padding in __cxa_exception struct for compatibility
* f2e5509 Fix unlock in two-word version and add missing comment.
- 6229590 Add an option for disabling emergency buffers. (#14)
Interesting fixes: 03c83f5 add __cxa_init_primary_exception (#23) 5d8a158 Fix two bugs in __cxa_end_cleanup() b00c6c5 Insert padding in __cxa_dependent_exception 45ca8b1 Insert padding in __cxa_exception struct for compatibility f2e5509 Fix unlock in two-word version and add missing comment. 6229590 Add an option for disabling emergency buffers. (#14)
Jessica Clarke [Wed, 24 Jan 2024 23:49:54 +0000 (23:49 +0000)]
riscv: Convert local interrupt controller to a newbus PIC
Currently the local interrupt controller implementation is based on
pre-INTRNG arm/arm64 code, using hand-rolled event code rather than
INTRNG. This then interacts weirdly with the PLIC, and other future
interrupt controllers like the APLIC and IMSICs in the upcoming AIA
specification, since they become the root PIC despite not being the
logical root. Instead, use a real newbus device for it and register
it as the root PIC.
This also adapts the IPI code to make use of the newly-added INTRNG
generic IPI handling framework, adding a new sbi_ipi as the PIC. In
future there will be alternative devices for sending IPIs that will
register with higher priorities, such as the proposed AIA IMSIC and
ACLINT SSWI.
Jessica Clarke [Wed, 24 Jan 2024 23:49:54 +0000 (23:49 +0000)]
riscv: Create a newbus device for the SBI driver
This approach is based on the Arm PSCI driver, though that makes more
extensive use of its softc than we do here. This will be used to extract
the SBI IPI code as a real PIC.
Jessica Clarke [Wed, 24 Jan 2024 23:49:54 +0000 (23:49 +0000)]
intrng: Allow alternative IPI PICs to be registered and used
On RISC-V, the root PIC (whether the PLIC or, as will be the case in
future, the local interrupt controller) cannot send IPIs, relying on
another means to trigger the necessary software interrupts (firmware
calls), but there are upcoming standard devices that will be able to
inject them, so we can't just put the firmware calls in the root PIC
driver.
Thus, split out a new intr_ipi_dev from intr_irq_root_dev to use for
sending IPIs. New devices can be registered with a given priority up
until the first IPI is set up, when the best device seen so far gets
frozen as the IPI device to use.
Jessica Clarke [Wed, 24 Jan 2024 23:49:53 +0000 (23:49 +0000)]
intrng: Extract arm/arm64 IPI->PIC glue code
The arm and arm64 implementations of dispatching IPIs via PIC_IPI_SEND
are almost identical, and entirely MI with the lone exception of a
single store barrier on arm64 (that is likely either redundant or needed
on arm too). Thus, de-duplicate this code by moving it to INTRNG as a
generic IPI glue framework. The ipi_* functions remain declared in MD
smp.h headers and implemented in MD code, but are trivial wrappers
around intr_ipi_send that could be made MI, at least for INTRNG ports,
at a later date.
Note that, whilst both arm and arm64 had an ii_send member in intr_ipi
to abstract over how to send interrupts,, they were always ultimately
using PIC_IPI_SEND, and so this complexity has been removed. A follow-up
commit will re-introduce the same flexibility by instead allowing a
device other than the root PIC to be registered as the IPI sender.
As part of this, strengthen a MAXCPU assertion that was missed in commit 2f0b059eeafc ("intrng: switch from MAXCPU to mp_ncpus") (which itself is
mis-titled).
Jessica Clarke [Wed, 24 Jan 2024 23:49:53 +0000 (23:49 +0000)]
intrng: Remove irq_root_ipicount and corresponding intr_pic_claim_root arg
The static irq_root_ipicount variable is only ever written to (with the
value passed to irq_root_ipicount), never read. Moreover, the bcm2836
driver, as used by the Raspberry Pi 2B and 3A/B (but not 4, which uses a
GIC-400, though does have the legacy interrupt controller present too)
passes 0 as ipicount, despite implementing IPIs. It's thus inaccurate
and serves no purpose, so should be removed.
Kyle Evans [Wed, 24 Jan 2024 19:36:26 +0000 (13:36 -0600)]
kern: tty: fix recanonicalization
`ti->ti_begin` is actually the offset within the first block that is
unread, so we must use that for our lower bound.
Moving to the previous block has to be done at the end of the loop in
order to correctly handle the case of ti_begin == TTYINQ_DATASIZE. At
that point, lastblock is still the last one with data written and the
next write into the queue would advance lastblock. If we move to the
previous block at the beginning, then we're essentially off by one block
for the entire scan and run the risk of running off the end of the block
queue.
The ti_begin == 0 case is still handled correctly, as we skip the loop
entirely and the linestart gets recorded as the first byte available for
writing. The bit after the loop about moving to the next block is also
still correct, even with both previous fixes in mind: we skipped moving
to the previous block if we hit ti_begin, and `off + 1` would in-fact be
a member of the next block from where we're reading if it falls on a
block boundary.
Reported by: dim
Fixes: 522083ffbd1ab ("kern: tty: recanonicalize the buffer on [...]")
Kristof Provost [Wed, 24 Jan 2024 16:34:01 +0000 (17:34 +0100)]
pf: only check MTU for IPv6 packets when forwarding
When the packets are generated locally (i.e. PFIL_FWD is not set) we
might generate overly large packets and rely on the NIC to fragment it
for us. In that case we'd reject a valid packet.
Reported by: Herbert J. Skuhra <herbert@gojira.at>
Tested by: Herbert J. Skuhra <herbert@gojira.at>
Fixes: 54c62e3e5d8cd90c5571a1d4c8c5f062d580480e
Sponsored by: Rubicon Communications, LLC ("Netgate")
Ed Maste [Wed, 24 Jan 2024 15:05:09 +0000 (10:05 -0500)]
ccdconfig: remove obsolete references to BSD disklabels
ccd(4) previoulsy had knowledge of BSD disklabels, and relied on their
use on the underlying disks, but this hasn't been the case since 2003
(commit 0f76d6d822f4).
Remove disklabel references from the man page.
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43574
Gleb Smirnoff [Wed, 24 Jan 2024 17:33:27 +0000 (09:33 -0800)]
callout: retire callout_async_drain()
This function was used only in TCP before 446ccdd08e2a. It was born in
pain in 2016 to plug different complex panics in TCP timers. It wasn't
warmly accepted in phabricator by all of the reviewers and my recollection
of overall agreement was that "if you need this KPI, then you'd better fix
your code to not need it". However, the function served its duty well all
the way to FreeBSD 14. But now that TCP doesn't need it anymore, let's
retire it to reduce complexity of callout code and also to avoid its
further use.
tcp: pass maxseg around instead of calculating locally
Improve slowpath processing (reordering, retransmissions)
slightly by calculating maxseg only once. This typically
saves one of two calls to tcp_maxseg().
Ed Maste [Mon, 22 Jan 2024 14:49:02 +0000 (09:49 -0500)]
release: rework distributions list
Components like base.txz and ports.txz are called distributions in the
installer, and with the introduction of pkgbase we will start dealing
with normal pkg packages in the installer. Rename EXTRA_PACKAGES to
DISTRIBUTIONS, and move base.txz and kernel.txz to that list.
This introduces no functional change but is a small cleanup in advance
of some pkgbase experimentation.
Reviewed by: cperciva
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43544
This is based purely on reading the Linux kcmp(2) man page.
In addition to the Linux set of comparators, I also added KCMP_FILEOBJ to
compare underlying file' objects.
Tested by: manu
Reviewed by: brooks, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43518
The method should return 0 if the file' underlying objects are same. In
other words, if 0 is returned, io from either of file causes
modifications of the same object.
Reviewed by: brooks, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43518
Kyle Evans [Wed, 24 Jan 2024 05:00:36 +0000 (23:00 -0600)]
ncurses: serialize the tinfo build a little bit
Move ncurses_dll.h to GENHDRS to start with; it's been generated from
ncurses_dll.h.in for years, so it's not actually in a different category
than all of the other GENHDRS. Slap an .ORDER on it to ensure that we
build ncurses_dll.h and curses.h before any *.c gets compiled.
This should sufficiently address a build race seen downstream where
ncurses_dll.h is present but not yet populated.
Reviewed by: bapt
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D43540
Mike Karels [Tue, 23 Jan 2024 17:23:38 +0000 (11:23 -0600)]
tmpfs: increase vfs.tmpfs.memory_percent to 100 as workaround
The changes to avoid letting tmpfs use all of memory + swap do not
work well with ZFS ARC. The ARC can grow quite large, and will shrink
when there is memory pressure, but tmpfs does not allow for that.
Pending investigation of the right way to handle this, change the
default value of the vfs.tmpfs.memory_percent sysctl to 100 as a
workaround. The sysctl can be set to 95 to get back to the previous
default.
John Baldwin [Tue, 23 Jan 2024 17:38:09 +0000 (09:38 -0800)]
ofw_pcib: Use bus_generic_rman_*
- Implement bus_map/unmap_resource pulling bits from the previous
ofw_pcib_activate/deactivate_resource. One difference here is that
the bus_unmap_resource implementation uses bus_space_unmap instead
of pmap_unmapdev as a complement to the existing use of bus_space_map.
- Use bus_generic_rman_* in various routines for memory and I/O port
resources.
- Use pci_domain_* for PCI_RES_BUS in
ofw_pcib_activate/deactivate_resource.
John Baldwin [Tue, 23 Jan 2024 17:37:53 +0000 (09:37 -0800)]
powerpc: Fix bus_space_unmap
Previously it failed to compile since the macro passed too many
arguments to the function. Fix by adding the bus handle to the
function and adding an implementation that calls pmap_unmapdev.
John Baldwin [Tue, 23 Jan 2024 17:37:13 +0000 (09:37 -0800)]
arm nexus: Use bus_generic_rman_*
- Implement bus_get_rman pulling bits from nexus_alloc_resource.
- Implement bus_map/unmap_resource pulling bits from
nexus_activate/deactivate_resource.
- Use bus_generic_rman_* for
bus_alloc/adjust/activate/deactivate/release_resource except for
custom interrupt activate/deactivate logic still in
nexus_activate/deactivate_resource.
Mark Johnston [Tue, 23 Jan 2024 16:40:52 +0000 (11:40 -0500)]
bhyve: Simplify register definitions a bit
It's awkward to have separate tables for information which is logically
connected. Merge the gdb_regset[] and gdb_regsize[] arrays and update
gdb_read_regs() to cope with the result. This makes the addition of
arm64 support a bit cleaner.
Ed Maste [Tue, 23 Jan 2024 02:05:58 +0000 (21:05 -0500)]
bsdlabel: limit to 8 partitions
bsdlabel is intended to support up to 20 partitions, but the disklabel
struct has a d_partitions array with only BSD_NPARTS_MIN (8) entries.
Previously, an attempt to operate on a bsdlabel with more than eight
partitions resulted in a buffer overflow.
As a stopgap limit bsdlabel to 8 partitions until this is fixed
properly.