John Baldwin [Fri, 4 Aug 2023 23:41:05 +0000 (16:41 -0700)]
g_raid concat: Fail requests to read beyond the end of the volume
Previously a debug kernel would trigger an assertion failure if an I/O
request attempted to read off the end of a concat volume, but a
non-debug kernel would use an invalid sub-disk to try to complete the
request eventually resulting in some sort of fault in the kernel.
Instead, turn the assertions into explicit checks that fail requests
beyond the end of the volume with EIO. For requests which run over
the end of the volume, return a short request.
PR: 257838
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41222
Michael Tuexen [Fri, 4 Aug 2023 06:32:25 +0000 (08:32 +0200)]
sctp: improve consistency of acc and ccc handling in snd buffer
Don't clear the counters for the socket snd buffer when
shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR) is called.
This was causing the system to panic() when SCTP pf tests were
running.
Doug Moore [Fri, 4 Aug 2023 18:41:59 +0000 (13:41 -0500)]
vm_phys_enq_range: no alignment assert for npages==0
Do not assume that when vm_phys_enq_range is passed npages==0 that the
vm_page argument is valid in any way, much less that it has a
page-aligned address. Just don't look at it. Assert nothing about it.
Ed Maste [Wed, 2 Aug 2023 14:37:12 +0000 (10:37 -0400)]
openssh: retire HPN option handling
The HPN patch set was removed from base system SSH in January 2016, in
commit 60c59fad8806. We retained the option parsing (using OpenSSH's
support for deprecated options) to avoid breaking existing installations
upon upgrade, but sufficient time has now passed that we can remove this
special case.
Approved by: des
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41291
Because KASAN shadows the kernel image itself (KMSAN currently does
not), a shadow mapping of the boot stack must be created very early
during boot. pmap_san_enter() reserves a fixed number of pages for the
purpose of creating and mapping this shadow region.
After commit 789df254cc9e ("amd64: Use a larger boot stack"), it could
happen that this reservation is insufficient; this happens when
bootstack crosses a PAGE_SHIFT + KASAN_SHADOW_SCALE_SHIFT boundary.
Update the calculation to take into account the new size of the boot
stack.
Fixes: 789df254cc9e ("amd64: Use a larger boot stack")
Sponsored by: The FreeBSD Foundation
The closing parenthesis was in the wrong location, so instead of assigning the return value to krbret and then comparing it to zero, we were assigning the result of the comparison to krbret and then comparing that to zero. This has no practical significance since the value is not used after the loop terminates.
pf: test rules evaluation in the face of multiple IPv6 fragment headers
Send an ICMPv6 echo request packet with multiple IPv6 fragment headers.
Set rules to pass all packets, except for ICMPv6 echo requests.
pf ought to drop the echo request, but doesn't because it reassembles
the packet, and then doesn't handle the second fragment header. In other
words: it fails to detect the ICMPv6 echo header.
Reported by: Enrico Bassetti bassetti@di.uniroma1.it (NetSecurityLab @ Sapienza University of Rome)
MFC after: instant
Sponsored by: Rubicon Communications, LLC ("Netgate")
With 'scrub fragment reassemble' if a packet contains multiple IPv6
fragment headers we would reassemble the packet and immediately
continue processing it.
That is, we'd remove the first fragment header and expect the next
header to be a final header (i.e. TCP, UDP, ICMPv6, ...). However, if
it's another fragment header we'd not treat the packet correctly.
That is, we'd fail to recognise the payload and treat it as if it were
an IPv6 fragment rather than as its actual payload.
Fix this by restarting the normalisation on the reassembled packet.
If there are multiple fragment headers drop the packet.
Reported by: Enrico Bassetti bassetti@di.uniroma1.it (NetSecurityLab @ Sapienza University of Rome)
MFC after: instant
Sponsored by: Rubicon Communications, LLC ("Netgate")
Dmitry Chagin [Fri, 4 Aug 2023 13:03:57 +0000 (16:03 +0300)]
linux(4): Add a dedicated ioprio system calls
On Linux these system calls have an effect only when used in conjuction
with an I/O scheduler that supports I/O priorities. If no I/O scheduler
has been set for a thread, then by defaut the I/O priority will follow
the CPU nice value. Due to FreeBSD lack of I/O scheduler facilities, the
default Linux behavior is implemented.
Ubuntu 23.04 debootstrap requires Linux ionice which depends on these
syscalls.
Andrew Turner [Fri, 4 Aug 2023 09:14:16 +0000 (10:14 +0100)]
Remove MAXCPUS from the gicv3 driver
We create a static array of pointers to per-CPU data. Because the cpuid
space on arm64 is not sparse there is no need to add an extra level of
indirection. Move to use mallocarray to allocate the redistributors as
a single array.
libibverbs: remove nonexistent symbols from the linker map
The function ibv_query_device_ex is static inline, it is not exported
from the dso. With lld 16, which is much more picky about versioning and
undefined symbols, this becomes an error.
The ibv_register_driver driver symbol is explicitly versioned in
sources, it is non-existent in un-versioned object files.
This performs very well. x86-64-v3 and x86-64-v4 kernels were written,
too, but performed worse than the baseline kernel on short strings.
These may be added at a future point in time if the performance issues
can be fixed.
Kevin Bowling [Thu, 3 Aug 2023 20:49:15 +0000 (13:49 -0700)]
e1000: Enable TSO for lem(4) and em(4)
Most em(4) devices now enjoy TSO and TSO6, matching NetBSD and Linux
defaults.
A prior commit automasks TSO on 10/100 Ethernet due to errata and other
bugs for IPv6 were fixed recently allowing this.
Mike Karels identified a performance anomaly on Intel 82574L devices.
These are multiqueue enabled on FreeBSD since the conversion to
iflib. I am investigating whether this can be fixed, in the mean time
MSI-X with checksum offloads remain default.
i219 SPT devices have an errata that downclocks the DMA engine, which
results in TSO not being able to acheive line rate. Therefore, it is
disabled on:
* Intel(R) I219-LM and I219-V SPT
* Intel(R) I219-LM and I219-V SPT-H (2)
* Intel(R) I219-LM and I219-V LBG (3)
* Intel(R) I219-LM and I219-V SPT (4)
* Intel(R) I219-LM and I219-V SPT (5)
Many lem(4) devices enjoy TSO, exceptions being 82542, 82543, 82547.
TSO6 may be possible for some chipsets but I am still working through
my testing matrix and that is hidden behind hw.em.unsupported_tso.
If you encounter issues, you may disable TSO with for example:
ifconfig em0 -tso -tso6.
I ask to be informed of any deviations from normal operation requiring
this.
Thanks to cc@ for access to emulab.net.
On a sample I219 system it saves about 16% CPU on IPv4 and 19% on IPv6.
Ed Maste [Fri, 30 Sep 2022 12:14:22 +0000 (08:14 -0400)]
amd64: Bump MAXCPU to 1024 (from 256)
Hardware with more than 256 CPU cores is currently available and will
become increasingly common over FreeBSD 14's lifetime. Increase MAXCPU
in the amd64 GENERIC kernel configuration to 1024.
Earlier commits increased some related limits. These prerequisite
commits include at least:
- d7ed40243769 Increase MAX_APIC_ID safeguard to 0x800
- d1639e43c589 cpuset: increase userland maximum size to 1024
Global and allocated arrays sized by MAXCPU result in excessive bloat
on systems with lower core counts. In addition, some code used u_char
(8 bits) to hold a CPU index, which is not valid if MAXCPU is greater
than 256.
A number of recent commits addressed these sorts of issues, including
at least:
- 133935d26f20 pf: atomically increment state ids
- 74ac712f72cf vmm: Dynamically allocate a couple of per-CPU state save areas
- 78cfa762ebf2 callout: Move per-CPU callout state into the dpcpu region
- 42f722e721cd amd64: store pcids pmap data in pcpu zone
- 9801e7c275f6 smp_topo: dynamically allocate group array
- 9fb6718d1b18 smp: Dynamically allocate the stoppcbs array
- 2bb16c635249 x86: retire use of intr_bind
There are some additional allocations still to be converted and
more scalability work is required to make effective use of very high
core count systems, but this change allows us to boot on these systems
and provides a Kernel Binary Interface (KBI) for the FreeBSD 14 release
that supports these configurations.
Special thanks to AMD for providing hardware to test these changes.
PR: 269572
Reviewed by: des
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36838
`intr_bind(u_int vector, u_char cpu);` looked suspicious since
everywhere else "cpu" is a u_int and >256 processors isn't unreasonable
now. `intr_bind()` is not used anywhere in FreeBSD (now, after commit bf42f3738087). Time to remove.
Steve Kargl [Thu, 3 Aug 2023 19:51:17 +0000 (21:51 +0200)]
Clean up libm use of the __ieee754_ prefix
This removes the __ieee754_ prefix from a number of the math functions.
msun/src/math_private.h contains the statement that
/*
* ieee style elementary functions
*
* We rename functions here to improve other sources' diffability
* against fdlibm.
*/
#define __ieee754_sqrt sqrt
...
Here, fdlibm refers to https://netlib.org/fdlibm. It is seen from
https://netlib.org/fdlibm/readme that this prefix was used to
differentiate between different standards:
Wrapper functions will twist the result of the ieee754
function to comply to the standard specified by the value
of _LIB_VERSION
if _LIB_VERSION = _IEEE_, return the ieee754 result;
if _LIB_VERSION = _SVID_, return SVID result;
if _LIB_VERSION = _XOPEN_, return XOPEN result;
if _LIB_VERSION = _POSIX_, return POSIX/ANSI result.
(These are macros, see fdlibm.h for their definition.)
AFAICT, FreeBSD has never supported these wrappers. In addition, as C99,
principally the long double, functions were added to libm, this
convention was not maintained. Given that only 148 of 324 files under
lib/msun contain a "Copyright (C) 1993 by Sun Microsystems" statement,
the removal of the __ieee754_ prefix provides consistency across all
source files.
The last time someone compared lib/msun to fdlibm appears to be
Reduce diffs against vendor source (Sun fdlibm 5.3).
The most recent fdlibm RCS string that appears in a Sun Microsystem
copyrighted file is date "95/01/18". With Oracle Corporation's
acquisition of Sun Microsystems in 2009, it is unlikely that fdlibm will
ever be updated. A search for fdlibm at https://opensource.oracle.com/
yields no hits.
Finally, OpenBSD removed the use of this prefix over 21 years ago. pSee
revision 1.6 of OpenBSD's math_private.h.
Note: this does not drop the __ieee754_ prefix from the trigonometric
argument reduction functions, e.g., __ieee754_rem_pio2. These functions
are internal to the libm and exported through Symbol.map; and thus,
reserved for the implementation.
spibus(4): Add support for ACPI-based children enumeration
When spibus is attached as child of Intel SPI controller it scans all
ACPI nodes for "SPI Serial Bus Connection Resource Descriptor" described
in section 19.6.126 of ACPI specs.
If such a descriptor is found, SPI child is added to spibus, it's SPI
chip select, mode, clock, IRQ resource and ACPI handle are added to ivars.
Existing ACPI bus-hosted child is deleted afterwards.
Apple ACPI SPI extensions are supported.
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D41248
intelspi: Add support for ddb/kdb -compatible polled mode
Required for Apple and Microsoft -compatible HID-over-SPI drivers.
Most logic was already implemented in commit 3c0867343819
"spibus: extend API: add cs_delay ivar, KEEP_CS and NO_SLEEP flags".
It dissallowed driver sleeps in the interrupt context. This commit
extends this feature to handle ddb/kdb context with following:
- Skip driver locking if SPI functions were called from kdb/ddb.
- Reinitialize controller if kdb/ddb initiated SPI transfer has
interrupted another already running one. Does not work very
reliable yet.
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D41247
Some devices like Apple HID-over-SPI may contain more than one report
descriptors necessitating creation of multiple hidbus children.
Add indentificator of child devices to distinct them.
No functional changes intended.
Doug Moore [Thu, 3 Aug 2023 14:19:48 +0000 (09:19 -0500)]
vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
Mitchell Horne [Thu, 3 Aug 2023 13:48:15 +0000 (10:48 -0300)]
intro(9): rewrite from scratch
This page has existed as a placeholder since its creation in 1995. It
does not provide a useful introduction to the content in this section.
Reimagine it as a top-level overview page containing brief descriptions
and links to existing pages in section 9. It is roughly organized into
sub-sections, grouped by topic or subsystem. In other words, the page is
meant to function as a map to other content.
There is a balance to be found here between providing as many links as
possible and keeping the page concise and searchable. In general the aim
is to reference pages which provide the best entry point to a particular
topic. For example, a link is given to locking(9), but not to the
specific lock pages such as mutex(9) or rwlock(9).
NetBSD has done something similar with their intro(9), so some
inspiration has been taken from there, although their content doesn't
align that closely with what we have.
I have done a thorough review of our existing pages and formed these
subsections around them, but they are meant to evolve.
PR: 270481
Reviewed by: imp, emaste
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41104
Kevin Bowling [Thu, 3 Aug 2023 05:47:15 +0000 (22:47 -0700)]
e1000: Automask TSO on lem(4)/em(4) 10/100 Ethernet
This feature masks TSO capability when a link comes up at 10 or 100mbit
due to errata on the chips. This behavior matches previous versions of
FreeBSD as well as NetBSD and Linux.
A tunable, hw.em.unsupported_tso may be set if the admin desires to
disabling automasking and configure TSO settings manually.
Steve Kargl [Mon, 31 Jul 2023 22:34:48 +0000 (01:34 +0300)]
Fixes for bugs in sinpi/cospi/tanpi
patch to fix half-cycle trigonometric functions
Paul Zimmermann, a MPFR developer, contacted me about large errors in
the half-cycle trigonometric functions. I've have investigated these
issues and developed the attached patch. The float, double, and ld80
(long double) changes have been tested.
Caveat emptor: The ld128 changes have not been compiled. The ld128
changes have not been tested. I do not have access to a system that
uses ld128 floating point.
Here is an itemized list of changes:
* lib/msun/src/math_private.h:
. Add fast floor macros to compute the integer part of |x| for
0 <= |x| 01xp(N-1), where N is the precision of the type of x.
These macros are used in the half-cycle trigonometric functions
(e.g., sinpi(x)).
. The FFLOOR80 macros is used with the Intel 80-bit extended double
functions. This macors corrects an off-by-one error, which led to
enormous error for |x| > 0x1p32.
* lib/msun/src/s_cospif.c:
* lib/msun/src/s_cospi.c:
* lib/msun/ld80/s_cospil.c:
. Update Copyright years.
. Use FFLOOR*() macro to get integer part of |x|.
. Correct handle the range 0x1p(N-1) <= |x| < 0x1pN. Here, one needs
to determine if the integral value of |x| is even or odd to choose
+1 or -1. If |x| >= 0x1pN, always return +1.
* lib/msun/src/s_sinpif.c:
* lib/msun/src/s_sinpi.c:
* lib/msun/ld80/s_sinpil.c:
. Update Copyright years.
. Use FFLOOR*() macro to get integer part of |x|.
* lib/msun/src/s_tanpif.c:
* lib/msun/src/s_tanpi.c:
* lib/msun/ld80/s_tanpil.c:
. Update Copyright years.
. For +-0.5, return +-inf. Previously, tanpi[fl]() returned an NaN.
. Use FFLOOR*() to get integer part of |x|. Need to determine if the
integer part is even or odd. This is used to set +-0 for |x|
integral
and +-inf for (n+1/2).
. For 0x1p(N-1) <= |x| < 0x1pN need to determine if x is an even or
odd
integer to select +0 or -0. For |x| >= 0x1pN, it is always an even
integer, select 0.
. Note, tanpi[fl](x) is an odd function, so one needs to consider
tanpi[fl](-|x|) = - tanpi[fl](|x|).
* lib/msun/ld128/s_cospil.c:
* lib/msun/ld128/s_sinpil.c:
* lib/msun/ld128/s_tanpil.c:
. Update Copyright years.
. These routines use an FFLOOR128 macros, which likely should be
replaced by a bit twiddling algorithm.
. The same considerations above are applied to 0x1p112 <= |x| <
0x1p113,
and |x| >= 0x1p113 cases.
. Note, even and odd determination used fmodl(x,2.), which is likely
slow.
Steve Kargl [Mon, 31 Jul 2023 22:32:54 +0000 (01:32 +0300)]
Cleanup debugging code in libm
David Das (das@) committed Bruce Evan's (bde's) WIP code for
expl() and logl() in git revision 25a4d6bfda29119. That code
included instrumentation that allowed bde to generate pari
scripts used in testing/debugging. This patch removes that
instrumentation as it is unlikely that others will ever use it.
* math/libm/msun/src/math_private.h:
. Remove bde's macros for the generation of pari scripts.
* math/libm/msun/ld128/s_expl.c:
* math/libm/msun/ld128/s_logl.c:
* math/libm/msun/ld80/s_expl.c:
* math/libm/msun/ld80/s_logl.c:
. Remove bde's DOPRINT_START macro.
. Change RETURNP to RETURNF.
. Change RETURN2P to RETURNF. Adjust arguments as needed.
. Change RETURNPI to RETURNI.
. Change RETURN2PI to RETURNI. Adjust arguments as needed.
Ed Maste [Wed, 2 Aug 2023 14:18:33 +0000 (10:18 -0400)]
ssh: comment deprecated option handling for retired local patches
Older versions of FreeBSD included the HPN patch set and provided
client-side VersionAddendum. Both of these changes have been retired
but we've retained the option parsing for backwards compatibility to
avoid breaking upgrades. Add comment references to the relevant
commits.
Mark Johnston [Wed, 2 Aug 2023 13:24:06 +0000 (09:24 -0400)]
ObsoleteFiles.inc: Add an entry for libdtrace.so.2 in /usr/lib
There was a window between commits 4ae699122810 ("dtrace: Add
WITH_DTRACE_ASAN") and 848ff9bc1b97 ("libdtrace: Explicitly set SHLIBDIR
and SHLIB_MAJOR") where libdtrace.so.2 was being installed to /usr/lib
instead of /lib.
Mark Johnston [Tue, 1 Aug 2023 21:58:42 +0000 (17:58 -0400)]
kdb: Permit a NULL thread credential in kdb_backend_permitted()
Early during boot, thread0 runs with td->td_ucred == NULL. This is
fixed up in proc0_init() at SI_SUB_INTRINSIC. If a panic occurs before
then, rather than dereference a NULL pointer, simply allow the thread to
enter KDB.
Doug Moore [Wed, 2 Aug 2023 03:12:00 +0000 (22:12 -0500)]
vm_phys_enqueue_contig: handle npages==0
By letting vm_phys_enqueue_contig handle the case when npages == 0,
the callers can stop checking it, and the compiler can stop
zero-checking with every call to ffs(). Letting vm_phys_enqueue_contig
call vm_phys_enqueue_contig for part of its work also saves a few
bytes.
Ed Maste [Tue, 1 Aug 2023 12:48:02 +0000 (08:48 -0400)]
retire SHARED_TOOLCHAIN knob
Toolchain components were historically statically linked. They became
normal dynamically linked executables in commit 6ab18ea64d19. There is
no need to keep a special case build option for the toolchain; users who
want statically linked toolchain (or any other) components can use the
existing NO_SHARED knob.
Reviewed by: dim, sjg
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41266
John Baldwin [Tue, 1 Aug 2023 22:20:53 +0000 (15:20 -0700)]
mmc_xpt: Remove dubious end of mmc_print_ident
The end of this function finishes the passed in sbuf, calls printf
manually on the contents, and then clears it. The caller then tries
to print the resulting sbuf. This works currently but will not work
for future callers that pass in an external sbuf to be appended to.
John Baldwin [Tue, 1 Aug 2023 22:20:25 +0000 (15:20 -0700)]
cam xpt_*nounce_periph*: Various fixes for periphs without a protocol
If the periph doesn't have a valid protocol, these routines emit
fallback messages. However, the fallback messages duplicated the
periph name and unit number, and in the case of *denounce* included a
spurious newline.