Gleb Smirnoff [Tue, 7 May 2024 21:15:49 +0000 (14:15 -0700)]
sockets: garbage collect PRCOREQUESTS and stale comment
The code deleted predates FreeBSD history. The comment deleted is 99%
outdated. Why KAME decided to use these constants instead of normal ones
also lost in centuries.
John Baldwin [Tue, 7 May 2024 20:48:06 +0000 (13:48 -0700)]
nvme: Bump the alignment of struct nvme_health_information_page to 8
This ensures that embedded uint64_t values used for statistics
counters are aligned when allocating a structure on the stack or as
part of a containing structure. In particular this quiets
-Waddress-of-packed-member warnings from GCC when compiling the code
in nvmfd to update the stats.
John Baldwin [Tue, 7 May 2024 20:45:51 +0000 (13:45 -0700)]
nvmecontrol: Fix a sign compare mismatch
Even though mqes (uint16_t) and queue_size (u_int) are both unsigned,
the expression 'mqes + 1' gets promoted to int which is signed. Keep
the value unsigned by explicitly promoting mqes to u_int before
incrementing the value.
Mitchell Horne [Wed, 14 Feb 2024 16:56:13 +0000 (12:56 -0400)]
busdma: better handling of small segment bouncing
Typically, when a DMA transaction requires bouncing, we will break up
the request into segments that are, at maximum, page-sized.
However, in the atypical case of a driver whose maximum segment size is
smaller than PAGE_SIZE, we end up inefficiently assigning each segment
its own bounce page. For example, the dwmmc driver has a maximum segment
size of 2048 (PAGE_SIZE / 2); a 4-page transfer ends up requiring 8
bounce pages in the current scheme.
We should attempt to batch segments into bounce pages more efficiently.
This is achieved by pushing all considerations of the maximum segment
size into the new _bus_dmamap_addsegs() function, which wraps
_bus_dmamap_addseg(). Thus we allocate the minimal number of bounce
pages required to complete the entire transfer, while still performing
the transfer with smaller-sized transactions.
For most drivers with a segment size >= PAGE_SIZE, this will have no
impact. For drivers like dwmmc mentioned above, this improves the memory
and performance efficiency when bouncing a large transfer.
Co-authored-by: jhb
Reviewed by: jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45048
Mitchell Horne [Wed, 14 Feb 2024 17:01:15 +0000 (13:01 -0400)]
busdma: deduplicate _bus_dmamap_addseg() function
It is functionally identical in all implementations, so move the
function to subr_busdma_bounce.c. The KASSERT present in the x86 version
is now enabled for all architectures. It should be universally
applicable.
Reviewed by: jhb
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45047
Marko Zec [Tue, 7 May 2024 15:44:09 +0000 (17:44 +0200)]
fib_dxr: set fib_data field in struct dxr_aux early enough
Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures. This
may have led to NULL ptr dereferencing in dxr_change_rib_batch().
Warner Losh [Tue, 7 May 2024 02:06:54 +0000 (20:06 -0600)]
boot1.efi: Don't redundantly include devpath.c
devpath.c is on both the comand line and in libefi. This is redundant
and was a mistake in 4cf36aa1017f9. It never should have been here. In
practice, this just means that the devpath.o from libefi.a goes unused.
This will cause problems with some upcoming changes (D44872) to enable
LTO to reduce the size of the binaries, so go ahead and make the change
now to reduce the changeset for that. No functional change indended.
Ed Maste [Tue, 7 May 2024 01:45:50 +0000 (21:45 -0400)]
dlopen(3): mention fdlopen for capsicum(4)
Capsicum-sandboxed applications generally cannot use dlopen, as absolute
and cwd-relative paths cannot be accessed. Mention that fdlopen is
useful for sandboxed applications.
PR: 277169
Reviewed by: markj, oshogbo
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45108
Warner Losh [Mon, 6 May 2024 22:28:09 +0000 (16:28 -0600)]
sg: Add sg(4) man page
Add minimal sg(4) manual page. This implements a subset of the Linux
IOCTL interface for either native FreeBSD programs, or for Linux
binaries in the linuxulator.
Gleb Smirnoff [Mon, 6 May 2024 22:25:53 +0000 (15:25 -0700)]
lagg: enable tests that stress the configuration changes
I wasn't able to reproduce a crash in several runs. Might be that 48698ead6ff0 or earlier changes have closed the races. In case crashes
with just enabled tests are registered, I will either work on them or
disable tests again.
Gleb Smirnoff [Mon, 6 May 2024 22:25:53 +0000 (15:25 -0700)]
lagg: remove use of net epoch in the ioctl paths
Rely on LAGG_SLOCK() instead. The use of network epoch(9) here was added
in 6573d7580b851 (later tidied by 87bf9b9cbeebc) as a large sweep that
blindly substituted blocking kernel primitives with epoch(9). In these
particular code paths use of epoch(9) is incorrect and doesn't provide any
protection against a stale pointer. Recent fix 48698ead6ff0, which should
actually have removed the epoch use, created a potential sleeping in epoch
problem.
Gleb Smirnoff [Mon, 6 May 2024 22:25:53 +0000 (15:25 -0700)]
lagg: propagate up/down to the children
Based on the old submission from asomers@. With modern state of locking
in lagg(4), the patch got much simplier. Enable the test that was
waiting for this change.
John Baldwin [Mon, 6 May 2024 20:30:23 +0000 (13:30 -0700)]
nvmf: Remove packing pragmas from nvmf_proto.h
The protocol structures do not need explicit packing and static
assertions verify the size of all the structures as well as the
offsets of several key fields. The pragma triggers warnings when
building with GCC.
Colin Percival [Mon, 6 May 2024 20:26:52 +0000 (13:26 -0700)]
release: Rework vm_extra_pre_umount
The vm_extra_pre_umount function in vmimage.subr served two purposes:
It removed /etc/resolv.conf and /qemu (if cross-building), and it
provided a function for cloudware to override in order to make cloud
specific changes to the filesystem before constructing a disk image.
This resulted in a number of bugs:
1. When cross-building, the emulator binary was left as /qemu in the
Azure, GCE, Openstack and Vagrant images.
2. The build host's resolv.conf was left as /etc/resolv.conf in the
basic-ci and basic-cloudinit images.
3. When building GCE images, a Google-specific resolv.conf file was
constructed, and then deleted before the disk image was created.
Move the bits needed for running code inside a VM staging directory
from vm_install_base into a new vm_emulation_setup routine, and move
the corresponding cleanup bits from vm_extra_pre_umount to a new
vm_emulation_cleanup routine.
Remove the /qemu and /etc/resolv.conf cleanups from the cloudware
configuration files (where they exist) since we will now be running
vm_emulation_cleanup to remove those even when vm_extra_pre_umount
has been overridden.
Override vm_emulation_cleanup in gce.conf since in that one case (and
*only* that one case) we don't want to clean up resolv.conf (since it
was constructed for the VM image rather than copied from the host).
Gleb Smirnoff [Mon, 6 May 2024 19:03:20 +0000 (12:03 -0700)]
tests/fusefs: fix all tests that depend on kern.maxphys
The tests try to read kern.maxphys sysctl into int value, while
unsigned long is required. Not sure when this was broken, seems like
since cd8537910406e.
pcm/sound.* contains code that should be part of pcm/vchan.*.
Changes:
- pcm_setvchans() -> vchan_setnew()
- pcm_setmaxautovchans() -> vchan_setmaxauto()
- hw.snd.maxautovchans moved to pcm/vchan.c
- snd_maxautovchans declaration moved to pcm/vchan.h and definition to
pcm/vchan.c
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: dev_submerge.ch, markj
Differential Revision: https://reviews.freebsd.org/D45015
hw.snd.version and SND_DRV_VERSION define the sound driver version and
are meant to be used in bug reports, but because these values are
constant, there is not much useful information we can extract from them.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: dev_submerge.ch, emaste
Differential Revision: https://reviews.freebsd.org/D44996
While here, add device_printf()'s to all failure points. Also fix an
existing bug where we'd unlock an already unlocked channel, in case we
went to "out" (now "out2") before locking the channel.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Reviewed by: dev_submerge.ch
Differential Revision: https://reviews.freebsd.org/D44993
John Baldwin [Mon, 6 May 2024 17:49:04 +0000 (10:49 -0700)]
git-arc: Add list mode support for the update command
This can be particularly useful to do bulk-updates of multiple commits
using the same message, e.g.
git arc update -lm "Move function xyz to libfoo" main..myfeature
Similar to the list mode for the create command, git arc will list all
the candidate revisions with a single prompt. Once that is confirmed,
all the revisions are updated without showing the diffs or pausing
further prompts.
Warner Losh [Mon, 6 May 2024 15:10:46 +0000 (09:10 -0600)]
endian.h: Define uint{16,32,64}_t
The Draft Posix Issue 8 standard requires that these be defined. Define
them in the usual way that lets multiple headers define them. Opted to
not just use #include <stdint.h>, allowed by the draft, to be
conservative. Add notes about how we comply with Issue 8, and that we've
opted to define these only as macros, though the standard allows
functions, macros or both.
adduser: Fix confusion between `uclass` and `_class`.
This caused adduser to produce an invalid `pw(8)` command line. Due to
bugs in `pw(8)`, the command line was silently accepted and led to the
user being created, but locked out and with no home directory.
Also fix the default value for the “Another user?” prompt.
Kristof Provost [Mon, 6 May 2024 09:39:08 +0000 (11:39 +0200)]
if: guard against if_ioctl being NULL
There are situations where an struct ifnet has a NULL if_ioctl pointer.
For example, e6000sw creates such struct ifnets for each of its ports so it can
call into the MII code.
If there is then a link state event this calls do_link_state_change()
-> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() ->
get_operstate_ether(). That wants to know if the link is up or down, so it tries
to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL.
Randall Stewart [Sun, 5 May 2024 13:08:47 +0000 (09:08 -0400)]
TCP can be subject to Sack Attacks lets fix this issue.
There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like:
ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704)
This first sack looks fine but then the attacker sends
ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705)
ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707)
...
These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries.
To combat this we introduce three things.
when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing.
Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks.
The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway.
The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks).
After this set of changes is in rack can drop its SAD detection completely
Colin Percival [Sun, 5 May 2024 05:31:19 +0000 (22:31 -0700)]
release: Use qemu when cross-building vm images
For a bit over 5 years, we have used qemu when cross-building cloudware
images; in particular, it's necessary when installing packages which
might include post-install scripts.
Use qemu in the vm-images target too; while "generic" vm images don't
install packages, they still run newaliases and /etc/rc.d/ldconfig,
both of which fail without appropriate emulation.
Apr 22, 2024:
fixed regex engine gototab reallocation issue that was
introduced during the Nov 24 rewrite. Thanks to Arnold Robbins.
Fixed a scan bug in split in the case the separator is a single
character. thanks to Oguz Ismail for spotting the issue.
Mar 10, 2024:
fixed use-after-free bug in fnematch due to adjbuf invalidating
the pointers to buf. thanks to github user caffe3 for spotting
the issue and providing a fix, and to Miguel Pineiro Jr.
for the alternative fix.
MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
thanks to Miguel Pineiro Jr.
Rick Macklem [Sat, 4 May 2024 21:30:07 +0000 (14:30 -0700)]
nfsd: Fix Link conformance with RFC8881 for delegations
RFC8881 specifies that, when a Link operation occurs on an
NFSv4, that file delegations issued to other clients must
be recalled. Discovered during a recent discussion on nfsv4@ietf.org.
Although I have not observed a problem caused by not doing
the required delegation recall, it is definitely required
by the RFC, so this patch makes the server do the recall.
Tested during a recent NFSv4 IETF Bakeathon event.
Apr 22, 2024:
fixed regex engine gototab reallocation issue that was
introduced during the Nov 24 rewrite. Thanks to Arnold Robbins.
Fixed a scan bug in split in the case the separator is a single
character. thanks to Oguz Ismail for spotting the issue.
Mar 10, 2024:
fixed use-after-free bug in fnematch due to adjbuf invalidating
the pointers to buf. thanks to github user caffe3 for spotting
the issue and providing a fix, and to Miguel Pineiro Jr.
for the alternative fix.
MAX_UTF_BYTES in fnematch has been replaced with awk_mb_cur_max.
thanks to Miguel Pineiro Jr.
Note: This brings in the matchop-deref.* files that were missing (but in
FreeBSD already) and adds system-stauts.ok2. The latter has been deleted
in FreeBSD since it does not fit ATF well. Care must be taken to remove it
before the merge this time.
Lexi Winter [Sat, 4 May 2024 16:42:40 +0000 (10:42 -0600)]
rc.conf.5: modernise network_interfaces
It's not 1996 anymore, and we use CIDR nowadays. Update the various
ifconfig_ examples to use CIDR notation instead of netmasks, and also
add an example of a basic ifconfig_ entry that most users will be
interested in.
HP van Braam [Sat, 4 May 2024 14:40:15 +0000 (08:40 -0600)]
aic7xxx: make target mode enable a device hint
Previously it was only possible to enable target mode for these drivers
by rebuilding the kernel with AHC_TMODE_ENABLE or AHD_TMODE_ENABLE and a
bitmask of which units to statically enable for target mode.
There is no space-savings in the driver by not having AHC_TMODE_ENABLE
set, so in addition to the compile time option lets also introduce some
tunables:
HP van Braam [Sat, 4 May 2024 14:36:47 +0000 (08:36 -0600)]
aic7xxx: aicasm correct include file
aicasm just puts the value of the "-i" passed include file in the
generated file with quotes around it. This means that there are manual
edits made to aic7xxx_reg_print.c and aic79xx_reg_print.c
now we check to see if the value passed to '-i' starts with a '<', if it
does don't output the quotes.
Signed-off-by: HP van Braam <hp@tmm.cx>
Reviewed by: imp (minor code simplification)
Pull Request: https://github.com/freebsd/freebsd-src/pull/1209