If copyin family of routines fault, kernel does clear PSL.AC on the
fault entry, but the AC flag of the faulted frame is kept intact. Since
onfault handler is effectively jump, AC survives until syscall exit.
Reported by: m00nbsd, via Sony
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
admbugs: 975
Mark Johnston [Wed, 12 May 2021 13:39:36 +0000 (09:39 -0400)]
Fix mbuf leaks in various pru_send implementations
The various protocol implementations are not very consistent about
freeing mbufs in error paths. In general, all protocols must free both
"m" and "control" upon an error, except if PRUS_NOTREADY is specified
(this is only implemented by TCP and unix(4) and requires further work
not handled in this diff), in which case "control" still must be freed.
This diff plugs various leaks in the pru_send implementations.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Kristof Provost [Sun, 16 May 2021 06:51:54 +0000 (08:51 +0200)]
pf tests: More set skip on <ifgroup> tests
Test the specific case reported in PR 255852. Clearing the skip flag
on groups was broken because pfctl couldn't work out if a kif was a
group or not, because the kernel no longer set the pfik_group pointer.
Kristof Provost [Sun, 16 May 2021 06:50:17 +0000 (08:50 +0200)]
pf: Set the pfik_group for userspace
Userspace relies on this pointer to work out if the kif is a group or
not. It can't use it for anything else, because it's a pointer to a
kernel address. Substitute 0xfeedc0de for 'true', so that we don't leak
kernel memory addresses to userspace.
Alexander Motin [Sat, 24 Apr 2021 03:18:01 +0000 (23:18 -0400)]
mpr/mps(4): Make device mapping some more robust.
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
Mark Johnston [Tue, 11 May 2021 00:18:00 +0000 (20:18 -0400)]
vfs: Fix error handling in vn_fullpath_hardlink()
vn_fullpath_any_smr() will return a positive error number if the
caller-supplied buffer isn't big enough. In this case the error must be
propagated up, otherwise we may copy out uninitialized bytes.
Reported by: syzkaller+KMSAN
Reviewed by: mjg, kib
MFC aftr: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30198
Mateusz Guzik [Fri, 14 May 2021 07:50:10 +0000 (09:50 +0200)]
vm: add another pager private flag
Contrary to what was done in main, skip the following in order to not
disrupt KBI:
Move OBJ_SHADOWLIST around to let pager flags be next to each other.
Rick Macklem [Sat, 8 May 2021 00:30:56 +0000 (17:30 -0700)]
nfscl: Add support for va_birthtime to NFSv4
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.
Mark Johnston [Fri, 14 May 2021 14:07:56 +0000 (10:07 -0400)]
kqueue timer: Remove detached knotes from the process stop queue
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue. If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.
It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue. Ensure that we do not doubly enqueue the event structure in this
case.
Reported by: syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by: syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30251
Andriy Gapon [Fri, 7 May 2021 07:17:57 +0000 (10:17 +0300)]
storvsc: fix auto-sense reporting
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Andriy Gapon [Thu, 6 May 2021 18:49:37 +0000 (21:49 +0300)]
PCI hot-plug: use dedicated taskqueue for device attach / detach
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
Colin Percival [Sat, 15 May 2021 05:57:38 +0000 (22:57 -0700)]
MFC fixes to hostuuid handling
330f110b:
Fix 'hostuuid: preload data malformed' warning
If the preloaded hostuuid value is invalid and verbose booting is
enabled, a warning is printed. This printf had two bugs:
1. It was missing a trailing \n character.
2. The malformed UUID is printed with %s even though it is not known
to be NUL-terminated.
This commit adds the missing \n and uses %.*s with the (already known)
length of the preloaded UUID to ensure that we don't read past the end
of the buffer.
Reported by: kevans
Fixes: c3188289 Preload hostuuid for early-boot use
b6be9566:
Fix buffer overflow in preloaded hostuuid cleaning
When a module of type "hostuuid" is provided by the loader,
prison0_init strips any trailing whitespace and ASCII control
characters by (a) adjusting the buffer length, and (b) zeroing out
the characters in question, before storing it as the system's
hostuuid.
The buffer length adjustment was correct, but the zeroing overwrote
one byte higher in memory than intended -- in the typical case,
zeroing one byte past the end of the hostuuid buffer. Due to the
layout of buffers passed by the boot loader to the kernel, this will
be the first byte of a subsequent buffer.
This was *probably* harmless; prison0_init runs after preloaded kernel
modules have been linked and after the preloaded /boot/entropy cache
has been processed, so in both cases having the first byte overwritten
will not cause problems. We cannot however rule out the possibility
that other objects which are preloaded by the loader could suffer from
having the first byte overwritten.
Since the zeroing does not in fact serve any purpose, remove it and
trim trailing whitespace and ASCII control characters by adjusting
the buffer length alone.
Fixes: c3188289 Preload hostuuid for early-boot use
Reviewed by: kevans, markj
Mark Johnston [Thu, 13 May 2021 12:33:23 +0000 (08:33 -0400)]
fork: Suspend other threads if both RFPROC and RFMEM are not set
Otherwise, a multithreaded parent process may trigger races in
vm_forkproc() if one thread calls rfork() with RFMEM set and another
calls rfork() without RFMEM.
Also simplify vm_forkproc() a bit, vmspace_unshare() already checks to
see if the address space is shared.
Reported by: syzbot+0aa7c2bec74c4066c36f@syzkaller.appspotmail.com
Reported by: syzbot+ea84cb06937afeae609d@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30220
Mark Johnston [Thu, 13 May 2021 12:33:37 +0000 (08:33 -0400)]
posix timers: Check for overflow when converting to ns
Disallow a time or timer period value when the conversion to nanoseconds
would overflow. Otherwise it is possible to trigger a divison by zero
in realtime_expire_l(), where we compute the number of overruns by
dividing by the timer interval.
Fixes: 7995dae9 ("posix timers: Improve the overrun calculation")
Reported by: syzbot+5ab360bd3d3e3c5a6e0e@syzkaller.appspotmail.com
Reported by: syzbot+157b74ff493140d86eac@syzkaller.appspotmail.com
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30233
Cyril Zhang [Thu, 13 May 2021 12:55:06 +0000 (08:55 -0400)]
sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max. This is
inefficient and redundant. Caching the value of MB_CUR_MAX in a global
variable removes these calls and speeds up the runtime of sort. For
numeric sorting, runtime is almost halved in some tests.
PR: 255551
PR: 255840
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30170
netgraph/ng_bridge: Handle send errors during loop handling
If sending out a packet fails during the loop over all links, the
allocated memory is leaked and not all links receive a copy. This
patch fixes those problems, clarifies a premature abort of the loop,
and fixes a minory style(9) bug.
We generally like to avoid style changes when other changes are not
planned. In this case there are some makesyscalls.lua changes in the
pipeline, and this cleans up style nits in generated files that were
highlighted by experiments with clang-format.
Reviewed by: brooks, kevans
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30235
Guangyuan Yang [Sun, 16 May 2021 05:37:09 +0000 (01:37 -0400)]
kerberos.8: Replace dead link
Replace it with a tutorial hosted on kerberos.org and the classic
"dialogue" from Bill Bryant. The change has been reported and
merged upstream (https://github.com/heimdal/heimdal/commit/7f3445f1b7).
Mark Johnston [Wed, 12 May 2021 15:49:24 +0000 (11:49 -0400)]
nd6: Avoid using an uninitialized sockaddr in nd6_prefix_offlink()
Commit 81728a538 ("Split rtinit() into multiple functions.") removed
the initialization of sa6, but not one of its uses. This meant that we
were passing an uninitialized sockaddr as the address to
lltable_prefix_free(). Remove the variable outright to fix the problem.
The caller is expected to hold a reference on pr.
Fixes: 81728a538 ("Split rtinit() into multiple functions.")
Reported by: KMSAN
Reviewed by: donner
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30166
Mark Johnston [Wed, 12 May 2021 14:05:37 +0000 (10:05 -0400)]
if: Remove unnecessary validation in the SIOCSIFNAME handler
A successful copyinstr() call guarantees that the returned string is
nul-terminated. Furthermore, the removed check would harmlessly compare
an uninitialized byte with '\0' if the new name is shorter than
IFNAMESIZ - 1.
Reported by: KMSAN
Sponsored by: The FreeBSD Foundation
Kristof Provost [Wed, 12 May 2021 17:13:40 +0000 (19:13 +0200)]
tests: Only log critical errors from scapy
Since 2.4.5 scapy started issuing warnings about a few different
configurations during our tests. These are harmless, but they generate
stderr output, which upsets atf_check.
Configure scapy to only log critical errors (and thus not warnings) to
fix these tests.
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.
sbin/ipfw: Fix parsing error in table based forward
The argument parser does not recognise the optional port for an
"tablearg" argument. Fix simplifies the code by make the internal
representation expicit for the parser. Includes the fix from D30208.
Mark Johnston [Mon, 3 May 2021 16:51:04 +0000 (12:51 -0400)]
Add missing sockaddr length and family validation to various protocols
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
Kristof Provost [Tue, 4 May 2021 17:23:15 +0000 (19:23 +0200)]
in6_mcast: Return EADDRINUSE when we've already joined the group
Distinguish between truly invalid requests and those that fail because
we've already joined the group. Both cases fail, but differentiating
them allows userspace to make more informed decisions about what the
error means.
For example. radvd tries to join the all-routers group on every SIGHUP.
This fails, because it's already joined it, but this failure should be
ignored (rather than treated as a sign that the interface's multicast is
broken).
This puts us in line with OpenBSD, NetBSD and Linux.
AIM (adaptive interrupt moderation) was part of BSD11 driver. Upon IFLIB
migration, AIM feature got lost. Re-introducing AIM back into IFLIB
based IXGBE driver.
One caveat is that in BSD11 driver, a queue comprises both Rx and Tx
ring. Starting from BSD12, Rx and Tx have their own queues and rings.
Also, IRQ is now only configured for Rx side. So, when AIM is
re-enabled, we should now consider only Rx stats for configuring EITR
register in contrast to BSD11 where Rx and Tx stats were considered to
manipulate EITR register.
Rick Macklem [Sun, 2 May 2021 23:04:27 +0000 (16:04 -0700)]
copy_file_range(2): improve copying of a large hole to EOF
PR#255523 reported that a file copy for a file with a large hole
to EOF on ZFS ran slowly over NFSv4.2.
The problem was that vn_generic_copy_file_range() would
loop around reading the hole's data and then see it is all
0s. It was coded this way since UFS always allocates a data
block near the end of the file, such that a hole to EOF never exists.
This patch modifies vn_generic_copy_file_range() to check for a
ENXIO returned from VOP_IOCTL(..FIOSEEKDATA..) and handle that
case as a hole to EOF. asomers@ confirms that it works for his
ZFS test case.