Randall Stewart [Thu, 2 Dec 2021 11:12:16 +0000 (06:12 -0500)]
tcp: unloading a module that is set to default should error.
I just discovered that the return of the EBUSY error was incorrectly
rigged so that you could unload a CC module that was set to default.
Its supposed to be an EBUSY error. Make it so.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D33229
Ed Maste [Wed, 1 Dec 2021 21:49:16 +0000 (16:49 -0500)]
OptionalObsoleteFiles.inc: remove MK_CXX rule for usr/bin/c++
In fact MK_CXX does not control whether /usr/bin/c++ is built -- it is
installed as a link to Clang (which is always a C/C++ compiler), and it
already exists in OptionalObsoleteFiles under MK_TOOLCHAIN.
Rick Macklem [Wed, 1 Dec 2021 21:55:17 +0000 (13:55 -0800)]
nfsd: Sanity check the ACL attribute
When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.
The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.
Rick Macklem [Wed, 1 Dec 2021 21:46:41 +0000 (13:46 -0800)]
nfsd: Do not try to cache a reply for NFSERR_BADSLOT
When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.
Ed Maste [Wed, 1 Dec 2021 21:38:10 +0000 (16:38 -0500)]
OptionalObsoleteFiles: move /usr/bin/CC to MK_TOOLCHAIN section
/usr/bin/CC is installed by usr.bin/clang/clang/Makefile, as with
/usr/bin/cc, /usr/bin/cpp, etc., and is not controlled by MK_CXX.
Move it to the same section as those tools.
(It may be that these should all be under
MK_TOOLCHAIN == no || MK_CLANG_IS_CC == no, but that seems like
unnecessary complexity.)
The ifp (struct ifnet) backpointer in the e1000 private ifnet
data is not used anymore since the iflib transition.
Remove it so that developers are not tempted to use it and
get a NULL pointer dereference.
Michael Tuexen [Wed, 1 Dec 2021 15:20:17 +0000 (16:20 +0100)]
libc sctp: fix sctp_getladdrs() when reporting no addresses
Section 9.5 of RFC 6458 (SCTP Socket API) requires that
sctp_getladdrs() returns 0 in case the socket is unbound. This
is the cause of reporting 0 addresses. So don't indicate an
error, just report this case as required.
Michael Tuexen [Wed, 1 Dec 2021 09:13:20 +0000 (10:13 +0100)]
libc sctp: fix sctp_getladdrs() for 64-bit BE platforms
When calling getsockopt() with SCTP_GET_LOCAL_ADDR_SIZE, use a
pointer to a 32-bit variable, since this is what the kernel
expects.
While there, do some cleanups.
Warner Losh [Tue, 30 Nov 2021 22:03:26 +0000 (15:03 -0700)]
Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire
Giant before calling device_busy/unbusy. However, if multiple threads
are opening a file, say, that causes the device to busy/unbusy, then we
can race to the root marking things busy. Move to using a reference
count to keep track of how many times a device_t has been made busy. Use
that count to make the same decisions that we'd make with the old device
state.
Note: gpiopps.c uses D_TRACKCLOSE. Others do as well. However, there's a
known race with closes that will be corrected for all the drivers that
do this in a future commit.
Commit message was for a very old version of the patch. Will re-commit
with the right one since it's so bad. There's no locked versions of
it...that code was reworked to use refcnt APIs.
Warner Losh [Tue, 30 Nov 2021 22:03:26 +0000 (15:03 -0700)]
Make device_busy/unbusy work w/o Giant held
The vast majority of the busy/unbusy users in the tree don't acquire Giant
before calling device_busy/unbusy. However, if multiple threads are opening a
file, say, that causes the device to busy/unbusy, then we can race to the root
marking things busy. Create a new device_busy_locked and device_unbusy_locked
that are the current implemntations of device_busy and device_unbusy. Make
device_busy and unbusy acquire Giant before calling the _locked versrions. Since
we never sleep in the busy/unbusy path, Giant's single threaded semantics
suffice to keep this safe.
Chuck Tuffli [Wed, 1 Dec 2021 05:07:32 +0000 (21:07 -0800)]
bhyve blockif: fix blockif_candelete with Capsicum
NVMe conformance tests for the Format command failed if the
backing-storage for the bhyve device was a file instead of a Zvol. The
tests (and the specification) expect a Format to destroy all previously
written data. The bhyve NVMe emulation implements this by trimming /
deallocating all data from the backing-storage.
The blockif_candelete() function indicated the file did not support
deallocation (i.e. fpathconf(..., _PC_DEALLOC_PRESENT) returned FALSE)
even though the kernel supported file hole punching. This occurs on
builds with Capsicum enabled because blockif did not allow the
fpathconf(2) right.
Fix is to add CAP_FPATHCONF to the cap_rights_init(3) call.
This was found while looking for driver_filter_t functions which got the
trap frame from the argument. This particular instance it isn't even
used, so remove now lest someone else get to it first.
* It sets SO_SNDBUF to be as large as MAXLINE. But for unix domain
sockets, the send buffer is bypassed. Packets go directly to the
peer's receive buffer, so setting and querying SO_SNDBUF is
ineffective. To ensure that the socket can accept messages of a
certain size, it would be necessary to add a SO_PEERRCVBUF socket
option that could query the connected peer's receive buffer size.
* It sets MAXLINE to 8 kB, which is larger than the default sockbuf size
of 4 kB. That's ok for the builtin syslogd, which sets its recvbuf
to 80 kB, but not ok for alternative sysloggers, like rsyslogd, which
use the default size.
As a consequence, writing messages of more than 4 kB with syslog() as a
non-root user while running rsyslogd would cause the logging application
to spin indefinitely within syslog().
Stefan Eßer [Tue, 30 Nov 2021 17:33:40 +0000 (18:33 +0100)]
vendor/bc: import release 5.2.1
This release fixes two parse bugs when in POSIX standard mode. One of
these bugs was due to a quirk of the POSIX grammar, and the other was
because bc was too strict.
Kristof Provost [Tue, 30 Nov 2021 15:30:22 +0000 (16:30 +0100)]
if_stf: KASAN fix
In in_stf_input() we grabbed a pointer to the IPv4 header and later did
an m_pullup() before we look at the IPv6 header. However, m_pullup()
could rearrange the mbuf chain and potentially invalidate the pointer to
the IPv4 header.
Avoid this issue by copying the IP header rather than getting a pointer
to it.
Mitchell Horne [Thu, 25 Nov 2021 16:01:11 +0000 (12:01 -0400)]
Implement GET_STACK_USAGE on remaining archs
This definition enables callers to estimate remaining space on the
kstack, and take action on it. Notably, it enables optimizations in the
GEOM and netgraph subsystems to directly dispatch work items when there
is sufficient stack space, rather than queuing them for a worker thread.
Implement it for riscv, arm, and mips. Remove the #ifdefs, so it will
not go unimplemented elsewhere.
Mitchell Horne [Tue, 30 Nov 2021 15:15:44 +0000 (11:15 -0400)]
arm64, powerpc: fix calculation of 'used' in GET_STACK_USAGE
We do not consider the space reserved for the pcb to be part of the
total kstack size, so it should not be included in the calculation of
the used stack size.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Mitchell Horne [Thu, 25 Nov 2021 15:54:33 +0000 (11:54 -0400)]
i386: take pcb and fpu area into account in GET_STACK_USAGE
On this platform, the pcb and FPU save area are allocated from the top
of each kernel stack, so they should be excluded from the calculation of
the total and used stack sizes.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32581
Bjoern A. Zeeb [Tue, 30 Nov 2021 14:23:18 +0000 (14:23 +0000)]
fw_stub: fix -Wunused-but-set-variable for firmware files
In case we are only embedding a single firmware image the variable
"parent" gets set but never used. Add checks for the number of files
for it and only print it out if we are exceeding the single file count.
This fixes -Wunused-but-set-variable warnings for the majority of
firmware files in the tree.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Andriy Gapon [Tue, 30 Nov 2021 13:23:23 +0000 (15:23 +0200)]
kern_tc: unify timecounter to bintime delta conversion
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small. But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different. Both functions use
approximate calculations based on th_scale that avoid division. Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be. tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.
As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta). Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off. tc_windup does the switch using its own
calculations of a new th_offset using the large delta. As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime. So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.
Such a jump must never happen. All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken. For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it. In that case the target thread
would never get woken up.
The unified calculations should ensure the monotonic property of the
uptime.
The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100). But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.
Wojciech Macek [Thu, 25 Nov 2021 09:36:55 +0000 (10:36 +0100)]
flex_spi: Support for FlexSPI Flash controller.
NXP FlexSPI is a complex SPI controller which provides
full offload for accessing NOR Flash.
Create a Flash driver which attaches to existing FreeBSD
infrastructure and exports generic READ and WRITE disk commands.
The Flash has to be identified first to configure controller
internals. For now, only one NOR Flash chip is supported.
Future commits shall either increase number of known chips
or implement SFDP mechanism which can be used by other Flash
drivers.
Alexander Motin [Tue, 30 Nov 2021 01:14:13 +0000 (20:14 -0500)]
mpsutil: Fix data truncation by too short buffers.
Length of some string buffers was insufficient for cases of more that
99 targets per HBA or slots per enclosure. Some others are tuned just
for better alignment. While there also fix output formatting issues.
rc.conf(5): Add _limits, _login_class, and _oomprotect
Add a few very useful variables that might easily be overlooked, since
they're only documented in rc.subr(8) which might not be the first place
that people look.
At least _oomprotect has existed since 11.0-RELEASE, and doesn't appear
to be very well-known. While the others aren't as new, in my estimation,
a lot more people would use them if they knew about them.
While here, also add a reference to rc.subr(8) and login.conf(5), and
sort the variables alphabetically.
Reported by: Daniel Dettlaff <dmilith at gmail.com>
Reviewed by: ceri, gbe, 0mp, ygy, a.wolk, pauamma
Brooks Davis [Mon, 29 Nov 2021 22:03:00 +0000 (22:03 +0000)]
syscalls: make syscall and __syscall SYSMUX
Rather than combining the declearation of nosys with the registration
of SYS_syscall, declare syscall(2) and __syscall(2) with the new
SYSMUX type in syscalls.master and declare nosys directly. This
eliminates the last use of syscall aliases in the tree.
Brooks Davis [Mon, 29 Nov 2021 22:03:00 +0000 (22:03 +0000)]
makesyscalls: add a new SYSMUX type
This type is for system call multiplexers (syscall(2), __syscall(2))
that don't have a normal handler and instead are handled in the
machine-dependent syscall code.
Brooks Davis [Mon, 29 Nov 2021 22:03:00 +0000 (22:03 +0000)]
syscalls: normalize exit
Declare the exit system call normally. This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int. Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.
Brooks Davis [Mon, 29 Nov 2021 22:02:59 +0000 (22:02 +0000)]
uipc: rework recvfrom, getsockname, getpeername
Stop using <foo>_args structs as part of internal kernel APIs. Add
a kern_recvfrom and adjust getsockname and getpeername's equivalent
functions to take individual arguments rather than a uap pointer.
Adopt a convention from CheriBSD that a function interacting with
userspace pointers and sitting between the sys_<foo> syscall and
kern_<foo> implementation is named user_<foo>.
Mark Johnston [Mon, 29 Nov 2021 18:50:30 +0000 (13:50 -0500)]
dummynet: Fix socket option length validation for IP_DUMMYNET3
The socket option handler tries to ensure that the option length is no
larger than some reasonable maximum, and no smaller than sizeof(struct
dn_id). But the loaded option length is stored in an int, which is
converted to an unsigned integer for the comparison with a size_t, so
negative values are not caught and instead get passed to malloc().
Change the code to use a size_t for the buffer size.
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33133
Mark Johnston [Mon, 29 Nov 2021 18:50:21 +0000 (13:50 -0500)]
dummynet: Avoid an out-of-bounds read in do_config()
do_config() processes a buffer of variable-length dummynet commands.
The loop which processes this buffer loads the fixed-length header
before checking whether there are any bytes left to read, so it performs
a 4-byte read past the end of the buffer before terminating.
Restructure the loop to avoid this.
Reported by: Jenkins (KASAN job)
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33132
Swap on file requires operational underlying mount, otherwise
swapoff_all() is guaranteed to panic due to the default strategy VOP for
reclaimed vnodes.
Reported and tested by: peterj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33147
swapoff_one(): only check free pages count manually turning swap off
When swap is turned off due to system shutdown or reboot, ignore the
check. Problem is that the check is not accurate by any means, free
page count can legitimately be low while system still able to page in
everything from the swap. Then, we turn swap off if swapping on
real file or some non-standard geom provider, and typically panic
when system appears to actually need to unavailable page.
For syscall, it is better to be safe than sorry.
Reported and tested by: peterj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33147
Kornel Duleba [Sun, 28 Nov 2021 11:24:07 +0000 (12:24 +0100)]
mmc: Fix HS200/HS400 capability check
HS200 and HS400 speeds can be enabled either with 1.2, or 1.8V signaling voltage.
Because of that we have four cabability flags: MMC_CAP_MMC_HS200_120,
MMC_CAP_MMC_HS200_180, MMC_CAP_MMC_HS400_120, MMC_CAP_MMC_HS400_180.
MMC logic only enables HS200/HS400 mode if both flags are set for the corresponding speed.
Fix that by being more permissive in host timing cap check.
libc/tests/stdlib/dynthr_mod/dynthr_mod.c: mark dummy as used
It receives the malloc() result, and we do not want the malloc() call
to be optimized out, which is allowed for hosted compiler. Use dummy
for actual write though.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week