Warner Losh [Thu, 9 Nov 2023 17:56:12 +0000 (10:56 -0700)]
_bus.h: Use standard licnese text
All of these used the 'immediately at beginning' variation of the
BSD-2-Clause license. This wasn't intentional, just what I copied from
from a random file in the tree back in 2005. It was not an intentional
decision.
The different arch bus.h files are a mix of BSD-2-Clause and
BSD-4-Clause that have various copyright holders (Charles M. Hannum,
Christopher G. Demetriou, The NetBSD Foundation and KATO Takenori), and
some of the content of these files were likely copied from there.
However, apart from the uncopyrightable interface lines, there are very
few comments. It's unclear if these comments are 'original material'
here to copyright, but to the extent that there is, license it under the
standard BSD-2-Clause copyright that's the norm for the project today.
In any event, the standard BSD-2-Clause is also closer to those
originals.
In addition, FreeBSD uses different type definitions than the original
NetBSD code in part. The comments that were copied have been copied a
lot, but appear in NetBSD's bus.h files in NetBSD 1.3.
While I'm here, assign the copyright, to the extent any exists from me,
to the FreeBSD Foundation. I just cut and pasted these into _bus.h from
the different machine files and those files have a rich history of
modification from the original imports from NetBSD over more than 25
years so it's tricky to say who, exactly, wrote each bit. Given the size
of the files, this seems like the best compromise. Also add an
acknowledgement to the NetBSD 1.3 bus.h files and their authors (there
were no additional FreeBSD authors listed in the various
sys/*/include/bus.h files). Finally, use the SPDX identifier instead of
multiple copies of the text.
Bojan Novković [Mon, 13 Nov 2023 18:02:30 +0000 (20:02 +0200)]
tty: properly check character position when handling IUTF8 backspaces
The tty_rubchar() code handling backspaces for UTF-8 characters didn't
properly check whether the beginning of the current line was reached.
This resulted in a kernel panic in ttyinq_unputchar() when prodded with
certain malformed UTF-8 sequences.
Mark Johnston [Mon, 13 Nov 2023 15:44:45 +0000 (10:44 -0500)]
arm64: Initialize x18 for APs earlier during boot
When KMSAN is configured, the instrumentation inserts calls to
__msan_get_context_state() into all function prologues. The
implementation dereferences curthread and thus assumes that x18 points
to the PCPU area. This applies in particular to init_secondary(), which
currently is responsible for initializing x18 for APs.
Move initialization into locore to avoid this problem. No functional
change intended.
Reviewed by: kib, andrew
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D42533
Warner Losh [Mon, 13 Nov 2023 14:23:53 +0000 (07:23 -0700)]
busdma: On systmes that use subr_busdma_bounce, measure deferred time
Measure the total deferred time (from the time we decide to defer until
we try again) for busdma_load requests. On systems that don't ever
defer, there is no performnce change. Add new sysctl
hw.busdma.zoneX.total_deferred_time to report this (in
microseconds).
Normally, deferrals don't happen in modern hardware... Except there's a
lot of buggy hardware that can't cope with memory > 4GB or that can't
cross a 4GB boundary (or even more restrictive values), necessitating
bouncing. This will measure the effect on the I/Os of this deferral.
Add sshd and local_unbound to the oom protected services.
syslogd is protected by default already, document it.
This was discussed on arch@, see
https://lists.freebsd.org/archives/freebsd-arch/2023-November/000543.html
sshd is protected to be able to investigate and fix oom issues on systems
which don't have out-of-band console access.
local_unbound is protected as it may be enabled for local use and without
DNS a lot grinds to a halt (including sshd).
ps.1: update regarding -D option and -p x/d interaction
The -p option does not imply -x, it is merely a different mode that ps
uses. Remove that statement from the -p option, effectively rolling back d6ae056e9dc96c2db45982ac358ba9ed716a9202.
pstef@ introduced the -D option in 5c0a1c15ff8cb66128f4826ace8ba91e0a31486d
which also turns ps into a similar mode. List the -D option along with
the others in the first sentence of the second paragraph of the
DESCRIPTION section for completeness and correctness sake.
Pointed out by: pstef@
Differential Revision: https://reviews.freebsd.org/D42552
Andrew Gallatin [Sat, 11 Nov 2023 17:54:19 +0000 (12:54 -0500)]
arm64: Implement bus_get_resource and bus_delete_resource.
These devmethods were not defined, leading to the surprising result
of using bus_set_resource(), and then immediately turning around
and getting zeros back from bus_get_resource(). These are now
simply passed through to the generic definitions, since there
is no need for them to be arm64 specific.
Note that jhb plans to replace most of the devmethods with
the generic versions.
Aaron LI [Sat, 11 Nov 2023 13:13:08 +0000 (14:13 +0100)]
if_wg: Missing radix unlock can cause deadlock
In function 'wg_aip_add()', the error path of returning ENOMEM when
(node == NULL) is forgetting to unlock the radix tree, and thus may lead
to a deadlock.
Andrew Turner [Mon, 30 Oct 2023 14:33:08 +0000 (14:33 +0000)]
arm64: Check if PSCI before calling SMCCC
As SMCCC depends on PSCI check if the latter is present before calling
the former. This fixes an issue where we may call into SMCCC when there
is no PSCI in the system causing a smccc_version assert to fail.
Andrew Turner [Wed, 25 Oct 2023 12:34:38 +0000 (13:34 +0100)]
arm64: Create a Linux view of the ID registers
When adding support for new hardware extensions we may not want to
enable support for the FreeBSD and Linux ABIs at the same time. To
support this split the Linux ID register and hwcaps so they can be
configured separately.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42372
Andrew Turner [Thu, 26 Oct 2023 10:21:57 +0000 (11:21 +0100)]
arm64: Use a hwcap ID rather than pointer
To allow for a different Linux hwcap value store the hwcap ID rather
than a pointer to elf{32,}_hwcap{2,}. This will be needed when creating
a different view of the ID registers for FreeBSD and Linux.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42371
Andrew Turner [Thu, 2 Nov 2023 09:49:27 +0000 (09:49 +0000)]
sysent: Add sv_protect
To allow for architecture specific protections add sv_protect to struct
sysent. This can be used to apply these after the executable is loaded
into the new address space.
Reviewed by: kib
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42440
Andrew Turner [Wed, 1 Nov 2023 15:18:12 +0000 (15:18 +0000)]
imgact_elf: Export __elfN(parse_notes)
This is useful to check if a note is present and contains an expected
value, e.g. to read NT_GNU_PROPERTY_TYPE_0 on arm64 to see if we should
enable BTI.
Andrew Turner [Thu, 2 Nov 2023 17:46:34 +0000 (17:46 +0000)]
bsd.compat.mk: Set MACHINE before including bsd.opts.mk
In bsd.opts.mk we check MACHINE_ARCH and may want to check MACHINE to
decide which options to enable. Unfortunately this is included too
early via bsd.compiler.mk.
Move including bsd.compiler.mk until after we can set MACHINE and
MACHINE_ARCH.
Reviewed by: imp
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42448
Andrew Turner [Thu, 9 Nov 2023 13:00:51 +0000 (13:00 +0000)]
llvm: Reduce overlinking with the minimal llvm
We only need to link against libz and libzstd when linking against the
fill libllvm, libllvmminimal doesn't use either library. Move adding
libz and libzstd to the list of libraries to link against to where
we decide to use the full libllvm.
Reported by: Cristian Marussi <Cristian.Marussi@arm.com>
Reported by: Colin S. Gordon <csgordon@fastmail.com>
Reviewed by: dim
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D42528
Kyle Evans [Fri, 10 Nov 2023 04:33:58 +0000 (22:33 -0600)]
crunchgen: fix "keep" for an ELF world, break it out
"keep" currently adds a leading underscore, which hasn't been useful or
accurate since a.out days. Preserve the symbol name as it's given
rather than mangle it to match ELF-style symbol names.
This was partially fixed back in 6cd35234a092d ("Assume ELF-style symbol names now.") for crunchgen, but
the keeplist wasn't changed to match it.
While we're here, break it out to bsd.crunchgen.mk for later use in
bsdbox.
Alexander Motin [Fri, 10 Nov 2023 00:46:26 +0000 (19:46 -0500)]
uma: Improve memory modified after free panic messages
- Pass zone pointer to trash_ctor() and report zone name in the panic
message. It may be difficult to figyre out zone just by the item size.
- Do not pass user arguments to internal trash calls, pass thezone.
- Report malloc type name in the same unified panic message.
- Report corruption offset from the beginning of the items instead of
the full pointer. It makes panic message shorter and more readable.
Tom Jones [Mon, 25 Sep 2023 18:33:45 +0000 (20:33 +0200)]
nlm: Fix error messages for failed remote rpcbind contact
In case of a remote rpcbind connection timeout,
the NFS kernel lock manager emits an error message
along the lines of:
NLM: failed to contact remote rpcbind, stat = 5, port = 28416
In the Bugzilla PR, Garrett Wollman identified the following problems
with that error message:
- The error is in decimal, which can only be deciphered by reading the
source code.
- The port number is byte-swapped.
- The error message does not identify the client the NLM is trying to
communicate with.
Fix the shortcomings of the current error message by:
- Printing out the port number correctly.
- Mentioning the remote client.
The low-level decimal error remains an outstanding issue though.
It seems like the error strings describing the error codes live outside
of the kernel code currently.
PR: 244698
Reported by: wollman
Approved by: allanjude
Sponsored by: National Bureau of Economic Research
Sponsored by: Klara, Inc.
Co-authored-by: Mateusz Piotrowski <0mp@FreeBSD.org>
Alexander Motin [Thu, 9 Nov 2023 18:07:46 +0000 (13:07 -0500)]
uma: Micro-optimize memory trashing
Use u_long for memory accesses instead of uint32_t. On my tests on
amd64 this by ~30% reduces time spent in those functions thanks to
bigger 64bit accesses. i386 still uses 32bit accesses.
The underlying issue that originally triggered a kernel panic was
addressed and the fix was ported to all relevant pmaps, so the
safeguards placed in vm_fault.c can be removed now.
Michael Tuexen [Thu, 9 Nov 2023 10:37:27 +0000 (11:37 +0100)]
if_tuntap: support receive checksum offloading for tap interfaces
When enabled, pretend that the IPv4 and transport layer checksum
is correct for packets injected via the character device.
This is a prerequisite for adding support for LRO, which will
be added next. Then packetdrill can be used to test the LRO
code in local mode.
shodanshok [Thu, 9 Nov 2023 00:30:47 +0000 (01:30 +0100)]
Increase L2ARC write rate and headroom
Current L2ARC write rate and headroom parameters are very conservative:
l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at
8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs
at 2x these rates).
These values were selected 15+ years ago based on then-current SSDs
size, performance and endurance. Today we have multi-TB, fast and
cheap SSDs which can sustain much higher read/write rates.
For this reason, this patch increases l2arc_write_max to 32M and
l2arc_headroom to 8 (4x increase for both).
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15457
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes #15504
Warner Losh [Wed, 8 Nov 2023 22:38:16 +0000 (15:38 -0700)]
cam: Add human readable statuses for some CAM_ status values.
CAM_NVME_STATUS and CAM_REQ_SOFTTIMEOUT were missing, though the latter
hasn't been used yet. The former is being used and showing up in dmesg
output as Unknown 0x420.
Low-power [Wed, 8 Nov 2023 20:19:38 +0000 (04:19 +0800)]
Linux: reject read/write mapping to immutable file only on VM_SHARED
Private read/write mapping can't be used to modify the mapped files, so
they will remain be immutable. Private read/write mappings are usually
used to load the data segment of executable files, rejecting them will
rendering immutable executable files to stop working.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: WHR <msl0000023508@gmail.com>
Closes #15344
AllKind [Wed, 8 Nov 2023 18:30:46 +0000 (19:30 +0100)]
Workaround to allow openzfs-zfs-dkms install on Ubuntu
As shown in #15404#issuecomment-1765002181, Ubuntu kernel has
'Provides: zfs-dkms', which will cause uninstall of the kernel, when
attempting to install openzfs-zfs-dkms.
As a workaround remove the 'Conflicts: zfs-dkms' definition from
the debian control file.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15503
Don Brady [Wed, 8 Nov 2023 18:19:41 +0000 (11:19 -0700)]
RAID-Z expansion feature
This feature allows disks to be added one at a time to a RAID-Z group,
expanding its capacity incrementally. This feature is especially useful
for small pools (typically with only one RAID-Z group), where there
isn't sufficient hardware to add capacity by adding a whole new RAID-Z
group (typically doubling the number of disks).
== Initiating expansion ==
A new device (disk) can be attached to an existing RAIDZ vdev, by
running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank
raidz2-0 sda`. The new device will become part of the RAIDZ group. A
"raidz expansion" will be initiated, and the new device will contribute
additional space to the RAIDZ group once the expansion completes.
The `feature@raidz_expansion` on-disk feature flag must be `enabled` to
initiate an expansion, and it remains `active` for the life of the pool.
In other words, pools with expanded RAIDZ vdevs can not be imported by
older releases of the ZFS software.
== During expansion ==
The expansion entails reading all allocated space from existing disks in
the RAIDZ group, and rewriting it to the new disks in the RAIDZ group
(including the newly added device).
The expansion progress can be monitored with `zpool status`.
Data redundancy is maintained during (and after) the expansion. If a
disk fails while the expansion is in progress, the expansion pauses
until the health of the RAIDZ vdev is restored (e.g. by replacing the
failed disk and waiting for reconstruction to complete).
The pool remains accessible during expansion. Following a reboot or
export/import, the expansion resumes where it left off.
== After expansion ==
When the expansion completes, the additional space is available for use,
and is reflected in the `available` zfs property (as seen in `zfs list`,
`df`, etc).
Expansion does not change the number of failures that can be tolerated
without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after
expansion).
A RAIDZ vdev can be expanded multiple times.
After the expansion completes, old blocks remain with their old
data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but
distributed among the larger set of disks. New blocks will be written
with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been
expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ
vdev's "assumed parity ratio" does not change, so slightly less space
than is expected may be reported for newly-written blocks, according to
`zfs list`, `df`, `ls -s`, and similar tools.
Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com>
Closes #15022
Luiz Amaral [Wed, 8 Nov 2023 15:12:14 +0000 (16:12 +0100)]
tcpdump: decode pfsync packets on network interfaces
When print-ip-demux.c was introduced on ee67461e, the pfsync_ip_print
function was missed, causing tcpdump to treat pfsync packets on network
interfaces as an unknown protocol.
Bojan Novković [Wed, 8 Nov 2023 10:20:06 +0000 (05:20 -0500)]
riscv: Add a leaf PTP when pmap_enter(psind=1) creates a wired mapping
Let pmap_enter_l2() create wired mappings. In particular, allocate a
leaf PTP for use during demotion. This is the last pmap which requires
such a change ahead of reverting commit 64087fd7f372.
Oskar Holmlund [Wed, 8 Nov 2023 08:03:55 +0000 (09:03 +0100)]
UART: Remove ingenic xburst (mips) code from ns8250 driver
Since ingenic JZ4780 SOC support has been removed there is no need
to support ingenic quirks in the UART driver.
Invert of commit b192bae67ea835b7e431225bad375b5d5fe4297f
Reviewed by: imp, manu
Approved by: imp, manu (mentor)
Differential Revision: https://reviews.freebsd.org/D42497
Umer Saleem [Tue, 7 Nov 2023 21:24:16 +0000 (02:24 +0500)]
Linux 6.6 compat: fix implicit conversion error with debug build
With Linux v6.6.0 and GCC 12, when debug build is configured,
implicit conversion error is raised while converting
'enum <anonymous>' to 'boolean_t'. Use 'B_TRUE' instead of
'true' to fix the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15489
Gordon Tetlow [Tue, 7 Nov 2023 21:21:56 +0000 (13:21 -0800)]
Add kern.features.zfs
Add a ZFS feature flag to indicate OpenZFS availability.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gordon Tetlow <gordon@freebsd.org>
Closes #15484
Jason King [Tue, 7 Nov 2023 20:11:48 +0000 (14:11 -0600)]
sa_lookup() ignores buffer size.
When retrieving a system attribute, the size of the supplied
buffer is ignored. If the buffer is too small to hold the attribute,
sa_attr_op() will write past the end of the buffer.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jason King <jking@racktopsystems.com>
Closes #15476
Umer Saleem [Tue, 7 Nov 2023 20:04:56 +0000 (01:04 +0500)]
Remove obsolete_counts from grub2 compatibility list
PR#15459 add all read-only compatible zpool features to grub2
compatibility list. 'obsolete_counts' is a read-only features that
depends on 'device_removal' feature which is not read-only and
is marked as ZFEATURE_FLAG_MOS. Creating a pool with grub2
compatibility enables 'device_removal' feature as well, which is
not desired.
This commit removes the 'obsolete_counts' feature from
grub2 compatibility list, as GRUB only supports read-only
compatible features.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15499
certctl: Convert line endings before inspecting files.
This ensures that certificate files or bundles with DOS or Mac line
endings are recognized as such and handled identically to those with
Unix line endings.
Alexander Motin [Tue, 7 Nov 2023 19:37:18 +0000 (14:37 -0500)]
FreeBSD: Implement taskq_init_ent()
Previously taskq_init_ent() was an empty macro, while actual init
was done by taskq_dispatch_ent(). It could be slightly faster in
case taskq never enqueued. But without it taskq_empty_ent() relied
on the structure being zeroed by somebody else, that is not good.
As a side effect this allows the same task to be queued several
times, that is normal on FreeBSD, that may or may not get useful
here also one day.
Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15455
AllKind [Tue, 7 Nov 2023 19:27:29 +0000 (20:27 +0100)]
Fix dkms installation of deb packages created with Alien.
Alien does not honour the %posttrans hook.
So move the dkms uninstall/install scripts to the
%pre/%post hooks in case of package install/upgrade.
In case of package removal, handle that in %preun.
Add removal of all old dkms modules.
Add checking for broken 'dkms status'. Handle that as
good as possible and warn the user about it.
Also add more verbose messages about what we are doing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mart Frauenlob <AllKind@fastest.cc>
Closes #15415
Mark Johnston [Tue, 7 Nov 2023 18:24:15 +0000 (13:24 -0500)]
Make abd_raidz_gen_iterate() pass an initialized pointer to the callback
Otherwise callbacks may trigger KMSAN violations in the dlen == 0 case.
For example, raidz_syn_pq_abd() will compare an uninitialized pointer
with itself before returning. This seems harmless, but let's maintain
good hygiene and avoid passing uninitialized variables, if only to
placate KMSAN.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #15491
Tony Hutter [Tue, 7 Nov 2023 17:09:24 +0000 (09:09 -0800)]
zed: misc vdev_enc_sysfs_path fixes
There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed
gets passed is stale. To mitigate this, dynamically check the sysfs
path at the time of zed event processing, and use the dynamic value if
possible. Note that there will be other times when we can not
dynamically detect the sysfs path (like if a disk disappears) and have
to rely on the old value for things like turning on the fault LED. That
is to say, we can't just blindly use the dynamic path in every case.
Also:
- Add enclosure sysfs entry when running 'zpool add'
- Fix 'slot' and 'enc' zpool.d scripts for nvme
Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15462
MigeljanImeri [Tue, 7 Nov 2023 17:06:14 +0000 (10:06 -0700)]
Fix accounting error for pending sync IO ops in zpool iostat
Currently vdev_queue_class_length is responsible for checking how long
the queue length is, however, it doesn't check the length when a list
is used, rather it just returns whether it is empty or not. To fix this
I added a counter variable to vdev_queue_class to keep track of the sync
IO ops, and changed vdev_queue_class_length to reference this variable
instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: MigeljanImeri <ImeriMigel@gmail.com>
Closes #15478
Mark Johnston [Tue, 7 Nov 2023 14:57:32 +0000 (09:57 -0500)]
stand: Rename LIBFDT to LIBSAFDT
Preemptively address a collision with LIBFDT (to be added in the future)
from src.libnames.mk, which gets included via bsd.progs.mk. No
functional change intended.
The commit that purported to fix CVE-2014-8611 (805288c2f062) only hid
it behind another bug. Two later commits, 86a16ada1ea6 and 44cf1e5eb470, attempted to address this new bug but mostly just confused
the issue. This commit rolls back the three previous changes and fixes
CVE-2014-8611 correctly.
The key to understanding the bug (and the fix) is that `_w` has
different meanings for different stream modes. If the stream is
unbuffered, it is always zero. If the stream is fully buffered, it is
the amount of space remaining in the buffer (equal to the buffer size
when the buffer is empty and zero when the buffer is full). If the
stream is line-buffered, it is a negative number reflecting the amount
of data in the buffer (zero when the buffer is empty and negative buffer
size when the buffer is full).
At the heart of `fflush()`, we call the stream's write function in a
loop, where `t` represents the return value from the last call and `n`
the amount of data that remains to be written. When the write function
fails, we need to move the unwritten data to the top of the buffer
(unless nothing was written) and adjust `_p` (which points to the next
free location in the buffer) and `_w` accordingly. These variables have
already been set to the values they should have after a successful
flush, so instead of adjusting them down to reflect what was written,
we're adjusting them up to reflect what remains.
The bug was that while `_p` was always adjusted, we only adjusted `_w`
if the stream was fully buffered. The fix is to also adjust `_w` for
line-buffered streams. Everything else is just noise.
Fixes: 805288c2f062 Fixes: 86a16ada1ea6 Fixes: 44cf1e5eb470
Sponsored by: Klara, Inc.
linuxkpi: races between linux_queue_delayed_work_on() and linux_cancel_delayed_work_sync()
1. Suppose that linux_queue_delayed_work_on() is called with
non-zero delay and found the work.state WORK_ST_IDLE. It
resets the state to WORK_ST_TIMER and locks timer.mtx. Now, if
linux_cancel_delayed_work_sync() was also called meantime, read
state as WORK_ST_TIMER and already taken the mutex, it is executing
callout_stop() on non-armed callout. Then linux_queue_delayed_work_on()
continues and schedules callout. But the return value from cancel() is
false, making it possible to the requeue from callback to slip in.
2. If linux_cancel_delayed_work_sync() returned true, we need to cancel
again. The requeue from callback could have revived the work.
The end result is that we schedule callout that might be freed, since
cancel_delayed_work_sync() claims that everything was stopped. This
contradicts the way the KPI is used in Linux, where consumers expect
that cancel_delayed_work_sync() is reliable on its own.
Zhenlei Huang [Tue, 7 Nov 2023 04:45:25 +0000 (12:45 +0800)]
kern linker: Do not retry loading modules on EEXIST
LINKER_LOAD_FILE() calls linker_load_dependencies() which will return
EEXIST in case the module to be loaded has already been compiled into
the kernel. Since the format of the module is now recognized then there
is no need to retry loading with a different linker, otherwise the
userland will get misleading error number ENOEXEC.
Rick Macklem [Mon, 6 Nov 2023 22:25:30 +0000 (14:25 -0800)]
nfscl: newnfs_copycred() cannot be called when a mutex is held
Since newnfs_copycred() calls crsetgroups() which in turn calls
crextend() which might do a malloc(M_WAITOK), newnfs_copycred()
cannot be called with a mutex held. Fortunately, the malloc()
call is rarely done, since XU_GROUPS is 16 and the NFS client
uses a maximum of 17 (only 17 groups will cause the malloc() to
be called). Further, it is only a problem if the malloc() tries
to sleep(). As such, this bug does not seem to have caused
problems in practice.
This patch fixes the one place in the NFS client where
newnfs_copycred() is called while a mutex is held by moving the
call to after where the mutex is released.
Found by inspection while working on an experimental patch.
Mark Johnston [Mon, 6 Nov 2023 19:57:56 +0000 (14:57 -0500)]
e6000sw: Fix locking in miibus_{read,write}reg implementations
Commit 469290648005e13b819a19353032ca53dda4378f made e6000sw's
implementation of miibus_(read|write)reg assume that the softc lock is
held. I presume that is to avoid lock recursion in e6000sw_attach() ->
e6000sw_attach_miibus() -> mii_attach() -> MIIBUS_READREG().
However, the lock assertion in e6000sw_readphy_locked() can fail if a
different driver uses the interface to probe registers. Work around the
problem by providing implementations which lock the softc if it is not
already locked.