Mark Johnston [Thu, 4 Jan 2024 13:34:31 +0000 (08:34 -0500)]
targ: Handle errors from suword()
In targstart() we are already handling an error and have no go way to
signal the failure to upper layers, so ignore the return value of
suword() there.
This is in preparation for annotating copyin() and related functions
with __result_use_check.
Olivier Certner [Fri, 24 Nov 2023 21:21:16 +0000 (22:21 +0100)]
libthr: thr_attr.c: EINVAL, not ENOTSUP, on invalid arguments
On first read, POSIX may seem ambiguous about the return code for some
scheduling-related pthread functions on invalid arguments. But a more
thorough reading and a bit of standards archeology strongly suggests
that this case should be handled by EINVAL and that ENOTSUP is reserved
for implementations providing only part of the functionality required by
the POSIX option POSIX_PRIORITY_SCHEDULING (e.g., if an implementation
doesn't support SCHED_FIFO, it should return ENOTSUP on a call to, e.g.,
sched_setscheduler() with 'policy' SCHED_FIFO).
This reading is supported by the second sentence of the very definition
of ENOTSUP, as worded in CAE/XSI Issue 5 and POSIX Issue 6: "The
implementation does not support this feature of the Realtime Feature
Group.", and the fact that an additional ENOTSUP case was added to
pthread_setschedparam() in Issue 6, which introduces SCHED_SPORADIC,
saying that pthread_setschedparam() may return it when attempting to
dynamically switch to SCHED_SPORADIC on systems that doesn't support
that.
glibc, illumos and NetBSD also support that reading by always returning
EINVAL, and OpenBSD as well, since it always returns EINVAL but the
corresponding code has a comment suggesting returning ENOTSUP for
SCHED_FIFO and SCHED_RR, which it effectively doesn't support.
Additionally, always returning EINVAL fixes inconsistencies where EINVAL
would be returned on some out-of-range values and ENOTSUP on others.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43006
Ruslan Bukin [Thu, 4 Jan 2024 09:35:00 +0000 (09:35 +0000)]
Setups with digital audio connections like SPDIF and ADAT require
a designated master clock to stay in sync. Add a sysctl setting
to control the preferred clock source for each HDSPe sound card.
Complement this by sysctl values to list available clock sources,
show the currently effective clock source and display the sync
status of all connections. Clock sources are named according to
RME user manuals.
Ricardo Branco [Wed, 3 Jan 2024 20:17:58 +0000 (21:17 +0100)]
hexdump: Do not trust st_size if it equals zero.
Fix for hexdump -s not being able to skip files residing in
pseudo-filesystems that advertise a zero size value.
Historically, many pseudofs-based filesystems (e.g., procfs) report
a va_size of 0 for numerous files classified as regular files.
Typically, the contents of these files are generated on demand
from kernel data as sbuf(9) strings at the time they are read.
Accurately reporting the size of these files is challenging, as it
often involves generating their contents. These pseudofs implementations
frequently report the size as 0. This is a historical behavior and also
aligns with Linux behavior. To maintain compatibility, we have chosen
to preserve the existing behavior and address it in the userland
application, rather than modifying it in the kernel (by updating the
correct value for va_size).
Kyle Evans [Wed, 3 Jan 2024 22:17:59 +0000 (16:17 -0600)]
bhyveload: hold /boot and do relative lookups for the loader
The next change will push bhyveload into capability mode right after we
allocate vcpu state, before we've setup or entered the loader, to limit
the surface area that a rogue loader script can touch.
With an explicit -l loader, we don't need to preopen /boot because
changing interpreters isn't allowed. We'll just dlopen() entirely in
advance in that case to eliminate some complexity.
Kyle Evans [Wed, 3 Jan 2024 22:17:59 +0000 (16:17 -0600)]
bhyveload: use a dirfd to support -h
Don't allow lookups from the loader scripts, which in rare cases may be
in guest control depending on the setup, to leave the specified host
root. Open the root dir and strictly do RESOLVE_BENEATH lookups from
there.
cb_open() has been restructured a bit to work nicely with this, using
fdopendir() in the directory case and just using the fd we already
opened in the regular file case.
hostbase_open() was split out to provide an obvious place to apply
rights(4) if that's something we care to do.
Navdeep Parhar [Tue, 2 Jan 2024 21:20:45 +0000 (13:20 -0800)]
cxgbe(4): Fix virtual interface reattach.
Replace the DOOMED flag with a transient DETACHING flag that is cleared
when VI is detached. This fixes VI reattach when only the VI and not
the parent nexus is detached. The old flag was never cleared and
prevented subsequent synch op's related to the VI.
Warner Losh [Tue, 2 Jan 2024 16:47:10 +0000 (09:47 -0700)]
make_check: Deobit fmake support
We don't need make_check to work in a fmake world anymore (nor have we
in the past decade). Just remove it here.
Note in passing it's been 10 years since we've added a new test here and
maybe we're past the need for this part of the build (or need to revamp
it to include all the features added to bmake since 2016 that the build
system silently depends on).
Warner Losh [Tue, 2 Jan 2024 16:43:44 +0000 (09:43 -0700)]
sys.mk: Remove support for building with fmake on modern systems.
We used to exclude a lot of extra hooks to allow for local
customizations of the build which couldn't be done outside of sys.mk,
but excluded that support for fmake. Remove those hacks.
Warner Losh [Tue, 2 Jan 2024 16:38:27 +0000 (09:38 -0700)]
bsd.port.mk: No need to support fmake anymore
There's no need to support fmake anymore. Always assume we can use
bmake's :tA modifier. The ports tree hasn't supported fmake in about a
decade anyway. Simplify here.
Warner Losh [Tue, 2 Jan 2024 16:33:19 +0000 (09:33 -0700)]
bsd.own.mk: Assume a modern make
Commit 83cb5bae966d7 added a check for MAKE_VERSION being new enough to
handle CTFCONVERT_CMD being an empty string since fmake of the time
didn't support it until just a few commits before 83cb5bae966d7. Later,
it was augmented with a check for .PARSEDIR to see if bmake was
running. fmake and boostrapping from fmake haven't worked in maybe 6 or
8 years, so we can remove the check here. If you want to update from
your FreeBSD 7 or FreeBSD 8 systems, you're even more out of luck than
you were before and must jump to an older version before jumping to
current for the source upgrade path.
Warner Losh [Tue, 2 Jan 2024 16:17:21 +0000 (09:17 -0700)]
Makefile: Deorbit fmake support
fmake has been out of the tree for 10 years / 5 major releases now. The
need to bootstrap from it has been gone for at least 6 if not 8
years. While we may still need to bootstrap bmake, we don't need to do
it from fmake, so only retail the infrastructure to update from bmake to
bmake. Retain, for now, the WANT_MAKE_VERSION stuff, though we're always
up to date when building from supported and quasi-supported platforms.
Also remove all the checks to see if .PARSEDIR is defined. It is always
defined and was an early, fail-safe way to tell fmake from bmake during
the transition.
Adjust comments that refer to old fmake and remove those no longer
relevant.
Jose Luis Duran [Sat, 28 Oct 2023 00:28:52 +0000 (00:28 +0000)]
traceroute: Implement ECN bleaching detection
Explicit Congestion Notification (ECN) is a mechanism that allows
end-to-end notification of network congestion without dropping packets
by explicitly setting the ECN code point (2 bits).
Per RFC 8087, section 3.5, network devices should not be configured to
change the ECN code point in the packets that they forward, except to
set the CE (Congestion Experienced) code point ('11') to signal
incipient congestion.
The current commit adds an -E flag to traceroute that crafts a packet
with an ECT(1) code point ('01').
If the packet is received back with a zero ECN code point ('00'), it
outputs that the hop in question erases or "bleaches" the ECN code point
values. Bleaching may occur for various reasons (including normalizing
packets to hide which equipment supports ECN). This policy prevents the
use of ECN by applications.
If the packet is received back with an all-ones ECN code point ('11'),
it outputs that the hop in question is experiencing "congestion".
If the packet is received back with a different ECN code point ('10'),
it outputs that the hop in question changes or "mangles" the ECN code
point values.
If the packet is received with the same ECN code point that was sent
('01'), it outputs that the hop has "passed" the ECN bits appropriately.
Jose Luis Duran [Fri, 27 Oct 2023 23:59:28 +0000 (23:59 +0000)]
traceroute6: Implement ECN bleaching detection
Explicit Congestion Notification (ECN) is a mechanism that allows
end-to-end notification of network congestion without dropping packets
by explicitly setting the ECN code point (2 bits).
Per RFC 8087, section 3.5, network devices should not be configured to
change the ECN code point in the packets that they forward, except to
set the CE (Congestion Experienced) code point ('11') to signal
incipient congestion.
The current commit adds an -E flag to traceroute6 that crafts a packet
with an ECT(1) code point ('01').
If the packet is received back with a zero ECN code point ('00'), it
outputs that the hop in question erases or "bleaches" the ECN code point
values. Bleaching may occur for various reasons (including normalizing
packets to hide which equipment supports ECN). This policy prevents the
use of ECN by applications.
If the packet is received back with an all-ones ECN code point ('11'),
it outputs that the hop in question is experiencing "congestion".
If the packet is received back with a different ECN code point ('10'),
it outputs that the hop in question changes or "mangles" the ECN code
point values.
If the packet is received with the same ECN code point that was sent
('01'), it outputs that the hop has "passed" the ECN bits appropriately.
Brooks Davis [Wed, 3 Jan 2024 16:39:53 +0000 (16:39 +0000)]
posixshm largepage_mmap: fix a racy test
You can't ever safely map a single page and then map a superpage sized
mapping over it with MAP_FIXED. Even in a single-threaded program, ASLR
might mean you land too close to another mapping and on CheriBSD we
don't allow the initial reservation to grow because doing so requires
program changes that are hard to automate.
To avoid this, map the entire region we want to use upfront.
vm_page_reclaim_contig(): update comment to chase recent changes
Commit 2619c5ccfe ("Avoid waiting on physical allocations that can't
possibly be satisfied") changed the return value from bool to errno.
Adjust the function description to match reality.
John Baldwin [Tue, 2 Jan 2024 21:15:13 +0000 (13:15 -0800)]
i386: Always bounce DMA requests above 4G for !PAE kernels
i386 kernels without 'options PAE' will still use PAE page tables if
the CPU supports PAE both to support larger amounts of RAM and for
PG_NX permissions. However, to avoid changing the API, bus_addr_t and
related constants (e.g. BUS_SPACE_MAXADDR) are still limited to
32 bits.
To cope with this, the x86 bus_dma code included an extra check to
bounce requests for addresses above BUS_SPACE_MAXADDR. This check was
elided (probably because it looks always-true on its face and had no
comment explaining its purpose) in recent refactoring. To fix,
restore a custom address-validation function for i386 kernels without
options PAE that includes this check.
Gleb Smirnoff [Tue, 2 Jan 2024 21:05:46 +0000 (13:05 -0800)]
netlink: refactor control data generation for recvmsg(2)
Netlink should return a very simple control data on every recvmsg(2)
syscall. This data is associated with a syscall, not with an nlmsg,
neither with internal our internal representation (nl_bufs). There is
no need to pre-allocate it in non-sleepable context and attach to
nl_buf. Allocate right in the syscall with M_WAITOK. This also
shaves lots of code and simplifies things.
Gleb Smirnoff [Tue, 2 Jan 2024 21:05:25 +0000 (13:05 -0800)]
netlink: improve nl_soreceive()
The previous commit conservatively mimiced operation of soreceive_generic().
The new code does two things:
- parses Netlink message headers and always returns at least one full nlmsg
- hides nl_buf boundaries from the userland, copying out several at once
More details can be found in the large comment block added.
Gleb Smirnoff [Tue, 2 Jan 2024 21:04:01 +0000 (13:04 -0800)]
netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing. The pcb rids
of the I/O queues as the socket buffer is exactly the queue. The
message writer is simplified a lot, as we now always deal with linear
buf. Notion of different buffer types goes away as way as different
kinds of writers. The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d18715475071 disappears.
Note on message throttling. Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.
Note on new nl_soreceive(). It emulates soreceive_generic(). It
must undergo further optimization, see large comment put in there.
Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed. However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.
Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI. On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.
Gleb Smirnoff [Tue, 2 Jan 2024 21:03:49 +0000 (13:03 -0800)]
tests/netlink: add netlink socket buffer test
With upcoming protocol specific socket buffer for Netlink we need some
additional tests that cover basic socket operations, w/o much of actual
Netlink knowledge. Following tests are performed:
1) Overflow. If an application keeps sending messages to the kernel,
but doesn't read out the replies, then first the receive buffer shall
fill and after that further messages from applications will be queued
on the send buffer until it is filled. After that socket operations
should block. However, reading from the receive buffer some data should
wake up the taskqueue and the send buffer should start draining again.
2) Peek & trunc. Check that socket correctly reports amount of readable
data with MSG_PEEK & MSG_TRUNC. This is typical pattern of Netlink apps.
3) Sizes. Check that zero size read doesn't affect the socket, undersize
read will return one truncated message and the message is removed from
the buffer. Check that large buffer will be filled in one read, without
any boundaries imposed by internal representation of the buffer. Check
that any meaningful read is amended with control data if requested so.
Gleb Smirnoff [Tue, 2 Jan 2024 21:03:40 +0000 (13:03 -0800)]
netlink: uninline some KPI functions that work with struct nl_writer
These functions work with a buffer embedded into nl_writer, which
is going to go opaque with upcoming changes. Make them private to
the netlink module. No functional change intended.
Gleb Smirnoff [Tue, 2 Jan 2024 21:03:21 +0000 (13:03 -0800)]
netlink: use domain specific send buffer
Instead of using generic socket code, create Netlink specific socket
buffer. It is a simple TAILQ of writes that came from userland. This
saves us one memory allocation that could fail and one memory copy.
Alex Richardson [Tue, 2 Jan 2024 19:06:51 +0000 (11:06 -0800)]
kldxref: fix bootstrapping on Linux with Clang 16
The glibc fts_open() callback type does not have the second const
qualifier and it appears that Clang 16 errors by default for mismatched
function pointer types. Add an ifdef to handle this case.
Richard Kümmel [Fri, 15 Dec 2023 11:49:45 +0000 (12:49 +0100)]
Fix udp IPv4-mapped address
Do not use the cached route if the destination isn't the same.
This fix a problem where an UDP packet will be sent via the wrong route
and interface if a previous one was sent via them.
Mark Johnston [Mon, 1 Jan 2024 18:54:15 +0000 (13:54 -0500)]
zfs: Fix SPA sysctl handlers
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl(). In particular, this code triggers an assertion
failure in sbuf_clear().
Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.
Apply to FreeBSD directly since this bug causes "sysctl -a" to crash the
kernel.
Warner Losh [Mon, 1 Jan 2024 06:12:54 +0000 (23:12 -0700)]
test-includes: Add -ansi to the compile line to catch problems
We support C89 files, but compile everything in the tree with C99 or
newer. By compiling these -ansi, that will force C89 which doesn't
understand inline. All our header files must use __inline instead of
inline when they define inline functions.
Rick Macklem [Sun, 31 Dec 2023 23:55:24 +0000 (15:55 -0800)]
vfs_vnops.c: Fix vn_generic_copy_file_range() for truncation
When copy_file_range(2) was first being developed,
*inoffp + len had to be <= infile_size or an error was
returned. This semantic (as defined by Linux) changed
to allow *inoffp + len to be greater than infile_size and
the copy would end at *inoffp + infile_size.
Unfortunately, the code that decided if the outfd should
be truncated in length did not get updated for this
semantics change.
As such, if a copy_file_range(2) is done, where infile_size - *inoffp
is less that outfile_size but len is large, the outfd file is truncated
when it should not be. (The semantics for this for Linux is to not
truncate outfd in this case.)
This patch fixes the problem. I believe the calculation is safe
for all non-negative values of outsize, *outoffp, *inoffp and insize,
which should be ok, since they are all guaranteed to be non-negative.
Note that this bug is not observed over NFSv4.2, since it truncates
len to infile_size - *inoffp.
Mark Johnston [Sun, 31 Dec 2023 16:36:12 +0000 (11:36 -0500)]
gtaskqueue: Fix a typo
This is a no-op in practice since gtaskqueue_thread_enqueue() and
taskqueue_thread_enqueue() are identical, and while _gtaskqueue_create()
compares the enqueue callback pointer with gtaskqueue_thread_enqueue(),
the result has no effect since TQ_FLAGS_UNLOCKED_ENQUEUE was copied
directly from subr_taskqueue.c and is unused in the gtaskqueue code.
Fix it anyway since it's a bug. More generally we really need to
consolidate subr_taskqueue.c and subr_gtaskqueue.c.
Mark Johnston [Sun, 31 Dec 2023 16:15:48 +0000 (11:15 -0500)]
frag6: Reduce code duplication
The code which removes a fragment queue from the per-VNET hash table was
duplicated three times. Factor it out into a function. No functional
change intended.
Michael Osipov [Fri, 24 Nov 2023 09:26:41 +0000 (10:26 +0100)]
periodic: Make daily diff(1) output as small is possible
Make, by default, daily diff(1) ignore whitespace changes and the unified output
a context of zero (0) lines. This reduces output of unrelated lines in e-mails
delivered to root.
Michael Osipov [Fri, 24 Nov 2023 09:26:41 +0000 (10:26 +0100)]
periodic: Make security diff(1) output as small is possible
Make, by default, security diff(1) produce a unified output with a context of
zero (0) lines. This reduces output of unrelated lines in e-mails delivered
to root.
Simon J. Gerraty [Sat, 30 Dec 2023 17:10:03 +0000 (09:10 -0800)]
bsd.man.mk allow staging compressed pages
In the DIRDEPS_BUILD we use staging.
The staging logic in bsd.man.mk was in the wrong place, shift it
and add compressed man pages to the stage set if appropriate.
The kern.pid_max_limit will hold the PID_MAX value the kernel was
compiled with. The existing kern.pid_max sysctl can be modified and
doesn't really represent maximum PID number in the system, as there
may still be processes created with higher PIDs before kern.pid_max
was lowered.
Rick Macklem [Fri, 29 Dec 2023 22:59:00 +0000 (14:59 -0800)]
copy_file_range.2: Clarify that only regular files work
PR#273962 reported that copy_file_range(2) did not work
on shared memory objects and returned EINVAL.
Although the reporter felt this was incorrect, it is what
the Linux copy_file_range(2) syscall does.
Since there was no collective agreement that the FreeBSD
semantics should be changed to no longer be Linux compatible,
copy_file_range(2) still works on regular files only.
This man page update clarifies that. If, someday, copy_file_range(2)
is changed to support non-regular files, then the man page will
need to be updated to reflect that.
Mateusz Guzik [Fri, 29 Dec 2023 18:51:56 +0000 (18:51 +0000)]
llvm: Support: don't block signals around close if it can be avoided
Signal blocking originally showed up in 51c2afc4b65b2782 ("Support:
Don't call close again if we get EINTR"), but it was overzealous --
there are systems where the error is known to be fine.
This commit elides signal blocking for said systems (the list is
incomplete though).
Note close() can still fail for other reasons (like ENOSPC), in which
case an error will be returned while the fd slot is cleared up.
Reviewed by: dim
Differential Revision: https://reviews.freebsd.org/D42984
After ad874544d9f018bf8eef4053b5ca7b856c4674cb, interface name
validation has been removed, resulting in two unit tests failures.
Drop the failing tests since they no longer apply.
Kenneth D. Merry [Thu, 28 Dec 2023 21:23:16 +0000 (16:23 -0500)]
camcontrol: Add a sense subcommand
As the name suggests, this sends a SCSI REQUEST SENSE to a device,
and prints out decoded sense information. It can also print out a
hexdump of the sense data.
sbin/camcontrol/camcontrol.c:
Add the new sense subcommand.
John Baldwin [Thu, 28 Dec 2023 19:17:22 +0000 (11:17 -0800)]
mbuf.9: Document mtodo
mtodo() accepts an mbuf and offset and returns a void * pointer to the
requested offset into the mbuf's associated data. Similar to mtod(),
no bounds checking is performed.
Joerg Pulz [Fri, 27 Oct 2023 15:27:37 +0000 (17:27 +0200)]
isp(4): Rework firmware handling/loading
Correctly identify the active firmware in flash on adapters with
primary and secondary firmware region in flash.
Correctly identify the active NVRAM on adapters with primary
and secondary NVRAM region in flash.
Loading ispfw(4) moved from isp_pci_attach() to isp_reset().
Drop the reference to ispfw(4) after using it so one can kldunload(8) it.
New isp_load_ram() function to load either ispfw(4) or flash firmware
into RISC's RAM.
New functions to read data from flash. The old ones will be removed later.
A bunch of new helper functions to identify and validate active flash
regions for firmware, auxiliary and NVRAM.
Overhaul ISP_FW_* macros and make use of it when comparing firmware
versions. We can handle firmware versions up to 255.255.255.
Firmware load priority slightly changed:
For 27xx and newer adapters:
- load ispfw(4) firmware
- request (active) flash firmware information
- compare version numbers of ispfw(4) and flash firmware
- load firmware with highest version into RISC's RAM
- if loading ispfw(4) is disabled or failed - load firmware from flash
- if everything else fails use MBOX_LOAD_FLASH_FIRMWARE as fallback
For 26xx and older adapters nothing changed:
- load ispfw(4) firmware and load it into RISC's RAM
- if loading ispfw(4) is disabled or failed use MBOX_EXEC_FIRMWARE
- for 26xx a preceding MBOX_LOAD_FLASH_FIRMWARE is used
New read only sysctl(8)'s:
dev.isp.N.fw_version_run: the firmware version actually running
dev.isp.N.fw_version_ispfw: the firmware version provided by ispfw(4)
dev.isp.N.fw_version_flash: the (active) firmware version in flash
While here:
- firmware attribute handling/parsing reworked
+ renamed defines from ISP2400_FW_ATTR_* to ISP_FW_ATTR_*
+ changed values to match new handling/parsing
+ added some more attributes
- enable FLT support on 26xx based adapters
- log level adjustments
- new function return status codes (some for now, some for later use)
- some minor style changes
Tested and approved to work on real hardware with:
- Qlogic ISP 2532 (QLogic QLE2560 8Gb FC Adapter)
- Qlogic ISP 2031 (QLogic QLE2662 16Gbit 2Port FC Adapter)
- Qlogic ISP 2722 (QLogic QLE2690 16Gb FC Adapter)
- Qlogic ISP 2812 (QLogic QLE2772 32Gbit 2Port FC Adapter)
PR: 273263
Reviewed by: mav
Pull Request: https://github.com/freebsd/freebsd-src/pull/877
MFC after: 1 month
Sponsored by: Technical University of Munich
Mark Johnston [Thu, 28 Dec 2023 17:08:04 +0000 (12:08 -0500)]
cam: Let cam_periph_unmapmem() return an error
As of commit b059686a71c8, cam_periph_unmapmem() can legitimately fail
if the copyout() operation fails. However, this failure was never
signaled to upper layers. In practice it is unlikely to occur
since cap_periph_mapmem() would most likely fail in such
circumstances anyway, but an error is nonetheless possible.
However, some code reading revealed a few paths where the return value
of cam_periph_mapmem() is not checked, and this is definitely a bug.
Add error checking there and let cam_periph_unmapmem() return errors
from copyout().