Mark Johnston [Thu, 21 Dec 2023 18:26:13 +0000 (13:26 -0500)]
ufs: Update *eofflag upon a read of an unlinked directory
If the directory is unlinked, no further entries will be returned, but
we return no error. At least one caller (vn_dir_next_dirent()) asserts
that a VOP_READDIR call which returns no error and no entries will set
*eofflag != 0, so the current behaviour of UFS can trigger an assertion
failure.
Kristof Provost [Thu, 21 Dec 2023 17:20:37 +0000 (18:20 +0100)]
pf: export missing state information
We did not export all of the information pfctl expected to print via the
new netlink code. This manifested as pfctl printing 'rtableid: 0', even
when there is no rtable set.
While we're addressing that also export other missing fields such as
dummynet, min_ttl, max_mss, ..
Michael Gmelin [Wed, 20 Dec 2023 20:21:55 +0000 (21:21 +0100)]
libifconfig: Fix bridge status member list
When this functionality was moved to libifconfig in 3dfbda3401abea84da9,
the end of list calculation was modified for unknown reasons, practically
limiting the number of bridge member returned to (about) 102.
This patch changes the calculation back to what it was originally and
adds a unit test to verify it works as expected.
Reported by: Patrick M. Hausen (via ML)
Reviewed by: kp
Approved by: kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43135
When trying to use a VLAN device (e.g. "em0.123") with a dot
the library fails to parse the interface correctly. The former
pattern is much too restrictive given that almost all characters
can be coerced into a device name via ifconfig.
Remove the particularly restrictive validation. Some characters
still cannot be used as an interface name as they are used as
delimiters in the syntax, but this allows to be able to use most
of them without an issue.
Warner Losh [Wed, 20 Dec 2023 19:09:09 +0000 (12:09 -0700)]
vtnet: Adjust rx buffer so IP header 32-bit aligned
Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating them.
This improves performance everywhere, and allows armv7 to work at all.
PR: 271288 (PR had a different fix than I wound up with)
MFC After: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D43136
lib/libc/amd64/string/strcspn.S: always return earliest match in 17--32 char case
When matching against a set of 17--32 characters, strcspn() uses two
invocations of PCMPISTRI to match against the first 16 characters
of the set and then the remaining characters. If a match was found in
the first half of the set, the code originally immediately returned
that match. However, it is possible for a match in the second half of
the set to occur earlier in the vector, leading to that match being
overlooked.
Fix the code by checking if there is a match in the second half of the
set and taking the earlier of the two matches.
The correctness of the function has been verified with extended unit
tests and test runs against the glibc test suite.
Approved by: mjg (implicit, via IRC)
MFC after: 1 week
MFC to: stable/14
Dimitry Andric [Wed, 20 Dec 2023 16:08:26 +0000 (17:08 +0100)]
Add missing sources to libclang_rt Makefiles, clean up unneeded ones
During the llvm-17 merge, a few new source files were not added to the
libclang_rt Makefiles, in particular sanitizer_thread_arg_retval.cpp
which is now required for AddressSanitizer and MemorySanitizer. Also,
MemorySanitizer now requires msan_dl.cpp.
While here, clean out a number of source files that compile into nothing
(because they only contain non-FreeBSD parts). Also, remove a duplicated
instance of tsan_new_delete.cpp from libclang_rt.tsan, since it is only
supposed to live in libclang_rt.tsan_cxx.
Gleb Smirnoff [Tue, 19 Dec 2023 19:24:17 +0000 (11:24 -0800)]
tcp: always set tcp_tun_port to a correct value
The tcp_tun_port field that is used to pass port value between UDP
and TCP in case of tunneling is a generic field that used to pass
data between network layers. It can be contaminated on entry, e.g.
by a VLAN tag set by a NIC driver. Explicily set it, so that it
is zeroed out in a normal not-tunneled TCP. If it contains garbage,
tcp_twcheck() later can enter wrong block of code and treat the packet
as incorrectly tunneled one. On main and stable/14 that will end up
with sending incorrect responses, but on stable/13 with ipfw(8) and
pcb-matching rules it may end up in a panic.
This is a minimal conservative patch to be merged to stable branches.
Later we may redesign this.
Gleb Smirnoff [Tue, 19 Dec 2023 18:21:56 +0000 (10:21 -0800)]
tcp_hpts: make the module unloadable
Although the HPTS subsytem wasn't initially designed as a loadable
module, now it is so. Make it possible to also unload it, but for
safety reasons hide that under 'kldunload -f'.
Gleb Smirnoff [Tue, 19 Dec 2023 18:21:56 +0000 (10:21 -0800)]
tcp_hpts: use tcp_pace.cts_last_ran for last ran table
Remove the global cts_last_ran and use already existing unused field of
struct tcp_hptsi, which seems originally planned to hold this table. This
makes it consistent with other malloc-ed tables, like main array of HPTS
entities and CPU groups.
Kristof Provost [Tue, 19 Dec 2023 13:15:48 +0000 (14:15 +0100)]
atf.test: fix installation of python test scripts
Python test scripts get processed (to add the `#! /usr/libexec/
atf_pytest_wrapper` shebang line), into a .xtmp file, and installed from
there. However, as there was no dependency of this .xtmp file on the
original file we kept reinstalling the .xtmp file, even if the original
had been edited already.
This could cause great confusion when debugging python test scripts.
Mike Karels [Tue, 19 Dec 2023 15:33:33 +0000 (09:33 -0600)]
tmpfs: increase memory reserve to a percent of available memory + swap
The tmpfs memory reserve defaulted to 4 MB, and other than that,
all of available memory + swap could be allocated to tmpfs files.
This was dangerous, as the page daemon attempts to keep some memory
free, using up swap, and then resulting in processes being killed.
Increase the reserve to a fraction of available memory + swap at
file system startup time. The limit is expressed as a percentage
of available memory + swap that can be used, and defaults to 95%.
The percentage can be changed via the vfs.tmpfs.memory_percent sysctl,
recomputing the reserve with the new percentage but the initial
available memory + swap. Note that the reserve can also be set
directly with an existing sysctl, ignoring the percentage. The
previous behavior can be specified by setting vfs.tmpfs.memory_percent
to 100.
Add sysctl for vfs.tmpfs.memory_percent and the pre-existing
vfs.tmpfs.memory_reserved to tmpfs(5).
Mike Karels [Tue, 19 Dec 2023 15:32:58 +0000 (09:32 -0600)]
tmpfs: enforce size limit on writes when file system size is default
tmpfs enforced the file system size limit on writes for file systems
with a specified size, but not when the size was the default. Add
enforcement when the size is default: do not allocate additional
pages if the available memory + swap falls to the reserve level.
Note, enforcement is also done when attempting to create a file,
both with and without an explicit file system size.
Bjoern A. Zeeb [Tue, 12 Dec 2023 01:59:17 +0000 (01:59 +0000)]
LinuxKPI: 802.11: more TXQ implementation and locking
Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
while looking at the code. They are needed by various wireless drivers.
Introduce an ltxq lock and protect the skbq by that.
This prevents panics due to a race between a driver upcall and
the net80211 tx downcall. While the former should be rcu protected we
cannot rely on that.
It remains questionable if we need to protect further fields there
(with a different lock?).
Also introduce a txq_mtx on the lhw which needs to be further deployed
but we need to come up with a good strategy to not end up with 7 different
locks.
Sponsored by: The FreeBSD Foundation
PR: 274178, 275710
Tested by: cc
MFC after: 3 days
Robert Wing [Tue, 19 Dec 2023 00:40:46 +0000 (15:40 -0900)]
tty: delete knotes when TTY is revoked
Do not clear knotes from the TTY until it gets dealloc'ed, unless the
TTY is being revoked, in that case delete the knotes when closed is
called on the TTY.
When knotes are cleared from a knlist, those knotes become detached from
the knlist. And when an event is triggered on a detached knote there
isn't an associated knlist and therefore no lock will be taken when the
event is triggered.
This becomes a problem when a detached knote is triggered on a TTY since
the mutex for a TTY is also used as the lock for its knlists. This
scenario ends up calling the TTY event handlers without the TTY lock
being held and tripping on asserts in the event handlers.
Mark Johnston [Mon, 18 Dec 2023 22:45:24 +0000 (17:45 -0500)]
nvme: Initialize HMB entries before loading them into the controller
struct nvme_hmb_desc contains a pad field which was not getting
initialized before being synced. This doesn't have much consequence but
triggers a report from KMSAN, which verifies that host-filled DMA memory
is initialized before it is made visible to the device. So, let's just
initialize it properly.
Brooks Davis [Mon, 18 Dec 2023 22:28:42 +0000 (22:28 +0000)]
arm/SYS.h: align with other arches
Rename SYSTRAP() macro to _SYSCALL() and add _SYSCALL_BODY() which invokes
the syscall via _SYCALL() and then calls cerror as required. Use to
implement PSEUDO() and RSYSCALL() removing _SYSCALL_NOERROR().
Brooks Davis [Mon, 18 Dec 2023 22:28:42 +0000 (22:28 +0000)]
{amd64,i386}/SYS.h: add _SYSCALL and _SYSCALL_BODY
Add a _SYSCALL(name) which calls the SYS_name syscall. Use it to add a
_SYSCALL_BODY() macro which invokes the syscall and calls cerror as
required. Use the latter to implement PSEUDO() and RSYSCALL().
Mark Johnston [Tue, 12 Dec 2023 14:35:48 +0000 (09:35 -0500)]
kqueue.9: Update the description of knlist_clear()
knlist_clear() does not free knotes and so does not call fdrop(), so
remove the bit of the function description which claims otherwise. (The
knote will be dropped by the next queue scan, and it is at that point
that the fd reference will be dropped.)
Warner Losh [Mon, 18 Dec 2023 17:06:27 +0000 (10:06 -0700)]
efibootmgr: Report the path to the device
Report the entire path to the device, rather than the the bit after /dev/
for the --esp command. Nothing in the tree depends on the output
format: Only bsdinstall's bootconfig script calls efibootmgr, and it
doesn't use the --esp/-E flag.
Warner Losh [Mon, 18 Dec 2023 04:32:15 +0000 (21:32 -0700)]
efibootmgr: Document -e command line switch
-e env will include `env` in the boot loader. Document that the boot
loader appends the `env` to the BootXXXX variable, and will parse it as
a series of a=b values to set in the boot loader's environment. These
assignments are separated by spaces. The env arg needs to be quoted if
more than one env var is to be set (we parse only the next argument on
the command line).
Alan Somers [Wed, 15 Nov 2023 17:56:05 +0000 (10:56 -0700)]
Remove _POSIX_PRIORITIZED_IO references from man pages
We don't support it, so there's no need to tell readers what would
happen if we did. Also, don't remind the user that a certain field is
ignored by aio_read. Mentioning every ignored field would make the man
pages too verbose.
Kyle Evans [Mon, 18 Dec 2023 05:53:03 +0000 (23:53 -0600)]
calendar: correct the search order for files
Include files that don't begin with a '/' are documented to search the
current directory, then /usr/share/calendar. This hasn't been accurate
for years, since e061f95e7b9b ("Rework calendar(1) parser") rewrote a
lot of this.
Stash off the cwd before we do any chdir()ing around and use that to
honor the same order we'll follow for the -f flag. This may result in
an extra lookup that will fail for the initial calendar file, but I
don't think it's worth the complexity to avoid it.
While we're here, fix the documentation to just reference the order
described in FILES so that we only need to keep it up to date in one
place.
Reviewed by: bapt
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D42278
dummynet: add simple gilbert-elliott channel model
Have a simple Gilbert-Elliott channel model in
dummynet to mimick correlated loss behavior of
realistic environments. This allows simpler testing
of burst-loss environments.
Umer Saleem [Fri, 15 Dec 2023 22:18:27 +0000 (03:18 +0500)]
Test LWB buffer overflow for block cloning
PR#15634 removes 128K into 2x68K LWB split optimization, since it
was found to cause LWB buffer overflow while trying to write 128KB
TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer,
with multiple VDEVs ZIL.
This commit adds a test for this particular scenario by writing
maximum sizes TX_CLONE_RANE record with 1022 block pointers into
68KB buffer, with two SLOG devices.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #15672
ufs: do not leave around empty buffers shadowing disk content
If the ffs_write() operation specified to overwrite the whole buffer,
ffs tries to save the read by not validating allocated buffer. Then
uiommove() might fail with EFAULT, in which case pages are left zeroed
and marked valid but not read from the disk. Then vn_io_fault() logic
retries the write after holding the user pages to avoid EFAULTs. In
erronous case of really faulty buffer, or in contrived case of writing
from file to itself, we are left with zeroed buffer instead of valid
content written back to disk.
Handle the situation by releasing non-cached buffer on fault, instead
of clearing it. Note that buffers with alive dependencies cannot be
released, but also either they cannot have valid content on the disk
because dependency on data buffer means that it was not yet written, or
they were reallocated by fragment extension or ffs_reallocbks(), and are
already fully valid.
Reported by: kevans
Discussed with: mav
In collaboration with: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Brad Davis [Fri, 15 Dec 2023 18:08:50 +0000 (10:08 -0800)]
release/Makefile.vm: Add cloudware overrides
Allow the cloudware *_FLAVOURS and *_FSLIST values to be overridden
at the command line, to assist users who want to e.g. build only one
of the many EC2 AMIs available.
Alexander Motin [Fri, 15 Dec 2023 17:51:41 +0000 (12:51 -0500)]
dmu: Allow buffer fills to fail
When ZFS overwrites a whole block, it does not bother to read the
old content from disk. It is a good optimization, but if the buffer
fill fails due to page fault or something else, the buffer ends up
corrupted, neither keeping old content, nor getting the new one.
On FreeBSD this is additionally complicated by page faults being
blocked by VFS layer, always returning EFAULT on attempt to write
from mmap()'ed but not yet cached address range. Normally it is
not a big problem, since after original failure VFS will retry the
write after reading the required data. The problem becomes worse
in specific case when somebody tries to write into a file its own
mmap()'ed content from the same location. In that situation the
only copy of the data is getting corrupted on the page fault and
the following retries only fixate the status quo. Block cloning
makes this issue easier to reproduce, since it does not read the
old data, unlike traditional file copy, that may work by chance.
This patch provides the fill status to dmu_buf_fill_done(), that
in case of error can destroy the corrupted buffer as if no write
happened. One more complication in case of block cloning is that
if error is possible during fill, dmu_buf_will_fill() must read
the data via fall-back to dmu_buf_will_dirty(). It is required
to allow in case of error restoring the buffer to a state after
the cloning, not not before it, that would happen if we just call
dbuf_undirty().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15665
xen: add atomic #defines to accomodate differing xen_ulong_t sizes
Alas, ARM declared xen_ulong_t to be 64-bits long, unlike i386 where
it matches the word size. As a result, compatibility wrappers are
needed for Xen atomic operations.
These are in fact GPLv2 when distributed with the Linux kernel, but the
license also allows MIT if distributed separately. Add the markers to
avoid interference by automated tools.
Notable upstream pull request merges:
#15643 a9b937e06 For db_marker inherit the db pointer for AVL comparision
#15644 e53e60c0b DMU: Fix lock leak on dbuf_hold() error
#15653 86063d903 dbuf: Handle arcbuf assignment after block cloning
#15656 86e115e21 dbuf: Set dr_data when unoverriding after clone
Kristof Provost [Wed, 13 Dec 2023 12:32:08 +0000 (13:32 +0100)]
mcast: fix leaked igmp packets on multicast cleanup
When we release a multicast address (e.g. on interface shutdown) we may
still have packets queued in inm_scq. We have to free those, or we'll
leak memory.
This commit caused us to not send IGMP leave messages if the inpcb went
away. In other words: we freed pending packets whenever the socket
closed rather than when the interface (or address) goes away.
Jessica Clarke [Thu, 14 Dec 2023 20:17:20 +0000 (20:17 +0000)]
kldxref: Reduce divergence between per-architecture files
Note that relbase is always 0 for DSOs so its omission for __KLD_SHARED
architectures was not a bug in practice.
Whilst here, also parenthesise the dest offset for where to avoid
transiently creating an out-of-bounds pointer, which is UB (though even
on CHERI architectures, where capability bounds compression can result
in that creating invalid capabilities that will trap on dereference,
optimisation will reassociate to the correct form in practice and thus
work just fine).
Kenneth D. Merry [Thu, 14 Dec 2023 20:05:17 +0000 (15:05 -0500)]
mpr, mps: Establish busdma boundaries for memory pools
Most all of the memory used by the cards in the mpr(4) and mps(4)
drivers is required, according to the specs and Broadcom developers,
to be within a 4GB segment of memory.
This includes:
System Request Message Frames pool
Reply Free Queues pool
ReplyDescriptorPost Queues pool
Chain Segments pool
Sense Buffers pool
SystemReply message pool
We got a bug report from Dwight Engen, who ran into data corruption
in the BAE port of FreeBSD:
> We have a port of the FreeBSD mpr driver to our kernel and recently
> I found an issue under heavy load where a DMA may go to the wrong
> address. The test system is a Supermicro X10SRH-CLN4F with the
> onboard SAS3008 controller setup with 2 enterprise Micron SSDs in
> RAID 0 (striped). I have debugged the issue and narrowed down that
> the errant DMA is one that has a segment that crosses a 4GB
> physical boundary. There are more details I can provide if you'd
> like, but with the attached patch in place I can no longer
> re-create the issue.
> I'm not sure if this is a known limit of the card (have not found a
> datasheet/programming docs for the chip) or our system is just
> doing something a bit different. Any helpful info or insight would
> be welcome.
> Anyway, just thought this might be helpful info if you want to
> apply a similar fix to FreeBSD. You can ignore/discard the commit
> message as it is my internal commit (blkio is our own tool we use
> to write/read every block of a device with CRC verification which
> is how I found the problem).
> Test case was two SSD's in RAID 0 (stripe). The logical disk was
> then partitioned into two partitions. One partition had lots of
> filesystem I/O and the other was initially filled using blkio with
> CRCable data and then read back with blkio CRC verify in a loop.
> Eventually blkio would report a bad CRC block because the physical
> page being read-ahead into didn't contain the right data. If the
> physical address in the arq/segs was for example 0x500003000 the
> data would actually be DMAed to 0x400003000.
The original patch was against mpr(4) before busdma templates were
introduced, and only affected the buffer pool (sc->buffer_dmat) in
the mpr(4) driver. After some discussion with Dwight and the
LSI/Broadcom developers and looking through the driver, it looks
like most of the queues in the driver are ok, because they limit
the memory used to memory below 4GB. The buffer queue and the chain
frames seem to be the exceptions.
This is pretty much the same between the mpr(4) and mps(4) drivers.
So, apply a 4GB boundary limitation for the buffer and chain frame pools
in the mpr(4) and mps(4) drivers.
Jessica Clarke [Thu, 14 Dec 2023 16:40:08 +0000 (16:40 +0000)]
Don't try and run kldxref for arm kernels
Surprisingly, kldxref does not currently support arm, and unhelpfully
this means it silently does nothing rather than give an error, so the
linker.hints entry added to the METALOG for -DNO_ROOT builds (and
pkgbase ones) refers to a file that doesn't exist. Ideally it would be
supported (and ideally the METALOG handling would be less fragile, but
without integrating it into kldxref the only real option would be to
just run find(1) to get the list of linker.hints files, which feels a
little backwards), but for now just paper over this by skipping the
build step on arm.
Reported by: bapt
Fixes: ff7c12c1f17e ("Make kldxref a bootstrap tool and use unconditionally")
If the destination file exists but we decide unlink it, set the dne
flag. This means we don't need to re-check the conditions that would
have caused us to delete the file when we later need to decide whether
to create or replace it.
- The HLPR flags are grouped together at the beginning because they are
the standard flags for programs using FTS. Move the N flag out from
among them to its correct place in the sequence.
- The Pflag variable isn't used outside main(), but moving it out lets
us skip initialization and keeps it with its friends H, L and R.
This test case tests two different things: first, that copying a symlink
results in a file with the same contents as the target of the symlink,
rather than a second symlink, and second, that cp will refuse to copy a
file to itself, or to a link to itself, or a link to its target. Leave
the first part in basic_symlink, move the second part to a new test case
named samefile, and slightly expand both cases.
Frank Hilgendorf [Wed, 13 Dec 2023 23:48:08 +0000 (23:48 +0000)]
bwn: remove unused ic_headroom
Unlike bwi(4), bwn(4) does not rely on ic_headroom (despite having it
set) but splits the bwn_txhdr (first) segment into its own transaction.
Remove ic_headroom to avoid net80211 troubles with not enough space in
the mbuf around ieee80211_mbuf_adjust().
Rewrite `copy_file()` so the lflag and sflag are handled as early as
possible instead of constantly checking that they're not set and then
handling them at the end. This also opens the door to changing the
failure logic at some future point (for instance, we might decide to
fall back to copying if `errno` indicates that the file system does not
support links).
Jessica Clarke [Wed, 13 Dec 2023 21:43:10 +0000 (21:43 +0000)]
Make kldxref a bootstrap tool and use unconditionally
Now that kldxref is a generic cross tool and can be built on non-FreeBSD
we can bootstrap it during the build and thus remove the condition for
whether it exists. We also need to make sure to add it to the METALOG
for -DNO_ROOT builds.
Jessica Clarke [Wed, 13 Dec 2023 21:43:09 +0000 (21:43 +0000)]
Makefile.inc1: Forward on METALOG and DISTBASE for kernel targets
Currently IMAKE_INSTALL, which includes -M METALOG, is enough for the
sub-makes to work, but using kldxref for -DNO_ROOT builds will require
manually adding linker.hints to the METALOG, and thus both METALOG
itself and DISTBASE need to be exposed directly to the sub-makes, so do
so.
Jessica Clarke [Wed, 13 Dec 2023 21:43:09 +0000 (21:43 +0000)]
tools/build: Provide sys/linker_set.h when cross-building
This is needed for kldxref, which will shortly become a bootstrap tool.
Linux can use the same one as FreeBSD (provided the cross-building
sys/cdefs.h is augmented appropriately), whilst macOS needs its own
Mach-O-specific implementation.