Dimitry Andric [Thu, 26 Aug 2021 15:36:03 +0000 (17:36 +0200)]
Add -Wno-error=unused-but-set-variable when building with Clang 13+
This warning triggers many times while building world. Downgrade it to a
warning until all occurrences have been fixed. Once the Clang warnings
have been fixed we should be able to turn it on for GCC as well. See
also f4fed768bba45a406f73ed1491d7e52fd1a8711d which did the same for the
kernel builds.
Andrew Turner [Mon, 13 Sep 2021 15:24:34 +0000 (15:24 +0000)]
Restrict spsr updated in the arm64 set_regs*
When using ptrace(2) on arm64 to set registers in a 32-bit program we
need to take care to only set some of the fields. Follow the existing
arm64 path and only let the user set the flags fields. This is also the
case in the arm kernel so fixes a change in behaviour between the two.
While here update set_regs to only set spsr and elr once.
Ed Maste [Sun, 12 Sep 2021 23:04:31 +0000 (19:04 -0400)]
libprocstat: extend zfs_defs hack for .pieo
By default _pie.a archives are built only for INTERNALLIBs, so there is
usually no need for zfs_defs.pieo to exist. However, some experimental
work builds _pie.a archives for everything. Extend the existing set of
zfs_defs hacks to build zfs_defs.pieo as well.
Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31924
Ed Maste [Sun, 12 Sep 2021 16:45:50 +0000 (12:45 -0400)]
bsd.lib.mk: add conditions for building _pie.a archives
As with other .a targets, build _pie.a archives only if LIB is set.
At present we build _pie.a only for INTERNALLIBs, and none of them
include bsd.lib.mk without setting LIB. However, we might want to build
_pie.a for non-INTERNALLIBs in the future.
Reviewed by: arichardson
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31920
Previosuly the link status was pooled in an infinite loop in a separate
kproc. Use taskqueue subsytem instead. This is a prequisite for making
this driver work as a loadable module.
Alex Richardson [Mon, 13 Sep 2021 09:16:05 +0000 (10:16 +0100)]
Add a test for https://reviews.freebsd.org/D31858 (PR 258310)
This test (based on https://github.com/jiixyj/epoll-shim/pull/32#issuecomment-891276654)
fails reproducibly on QEMU with KVM and `-smp 2` prior to D31858 (committed
as 98168a6e6c12dab8f608f6b5f5b0b175d2b87ef0) and passes with the patch applied.
Mark Johnston [Sun, 12 Sep 2021 20:05:49 +0000 (16:05 -0400)]
socket: Do not include control messages in FIONREAD return value
Some system software expects to be able to read at least the number of
bytes returned by FIONREAD. When control messages are counted in this
return value, this assumption is violated. Follow Linux and OpenBSD
here (as well as our own kevent(EVFILT_READ)) and only return the number
of data bytes available.
Rick Macklem [Sat, 11 Sep 2021 22:36:32 +0000 (15:36 -0700)]
nfscl: Make vfs.nfs.maxcopyrange larger by default
As of commit 103b207536f9, the NFSv4.2 server will limit the size
of a Copy operation based upon a 1 second timeout. The Linux 5.2
kernel server also limits Copy operation size to 4Mbytes.
As such, the NFSv4.2 client can attempt a large Copy without
resulting in a long RPC RTT for these servers.
This patch changes vfs.nfs.maxcopyrange to 64bits and sets
the default to the maximum possible size of SSIZE_MAX, since
a larger size makes the Copy operation more efficient and
allows for copying to complete with fewer RPCs.
The sysctl may be need to be made smaller for other non-FreeBSD
NFSv4.2 servers.
Mark Johnston [Sat, 11 Sep 2021 16:55:32 +0000 (12:55 -0400)]
aio: Fix up the opcode in aiocb32_copyin()
With lio_listio(2), the opcode is specified by userspace rather than
being hard-coded by the system call (e.g., aio_readv() -> LIO_READV).
kern_lio_listio() calls aio_aqueue() with an opcode of LIO_NOP, which
gets fixed up when the aiocb is copied in.
When copying in a job request for vectored I/O, we need to dynamically
allocate a uio to wrap an iovec. So aiocb_copyin() needs to get the
opcode from the aiocb and then decide whether an allocation is required.
We failed to do this in the COMPAT_FREEBSD32 case. Fix it.
Reported by: syzbot+27eab6f2c2162f2885ee@syzkaller.appspotmail.com
Reviewed by: kib, asomers
Fixes: f30a1ae8d529 ("lio_listio(2): Allow LIO_READV and LIO_WRITEV.")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31914
Mark Johnston [Sat, 11 Sep 2021 14:15:21 +0000 (10:15 -0400)]
sctp: Tighten up locking around sctp_aloc_assoc()
All callers of sctp_aloc_assoc() mark the PCB as connected after a
successful call (for one-to-one-style sockets). In all cases this is
done without the PCB lock, so the PCB's flags can be corrupted. We also
do not atomically check whether a one-to-one-style socket is a listening
socket, which violates various assumptions in solisten_proto().
We need to hold the PCB lock across all of sctp_aloc_assoc() to fix
this. In order to do that without introducing lock order reversals, we
have to hold the global info lock as well.
So:
- Convert sctp_aloc_assoc() so that the inp and info locks are
consistently held. It returns with the association lock held, as
before.
- Fix an apparent bug where we failed to remove an association from a
global hash if sctp_add_remote_addr() fails.
- sctp_select_a_tag() is called when initializing an association, and it
acquires the global info lock. To avoid lock recursion, push locking
into its callers.
- Introduce sctp_aloc_assoc_connected(), which atomically checks for a
listening socket and sets SCTP_PCB_FLAGS_CONNECTED.
There is still one edge case in sctp_process_cookie_new() where we do
not update PCB/socket state correctly.
Reviewed by: tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31908
Ka Ho Ng [Sat, 11 Sep 2021 12:03:38 +0000 (20:03 +0800)]
md: Add MD_MUSTDEALLOC support
This adds an option to detect if hole-punching is implemented by the
underlying file system. If this flag is set, and if the underlying file
system does not support hole-punching, md(4) fails BIO_DELETE requests
with EOPNOTSUPP.
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D31883
John Baldwin [Fri, 10 Sep 2021 22:10:00 +0000 (15:10 -0700)]
cxgbei: Handle errors in PDUs.
When a PDU with an error (bad padding, header digest, or data digest)
is received, log the error via ICL_WARN() and then reset the
connection via the ic_error callback.
Mark Johnston [Fri, 10 Sep 2021 21:21:11 +0000 (17:21 -0400)]
aio: Interlock with listen(2)
soo_aio_queue() did not handle the possibility that the provided socket
is a listening socket. Up until recently, to fix this one would have to
acquire the socket lock first and check, since the socket buffer locks
were destroyed by listen(2).
Now that the socket buffer locks belong to the socket, simply check
SOLISTENING(so) after acquiring them, and make listen(2) return an error
if any AIO jobs are enqueued on the socket.
Add a couple of simple regression test cases.
Note that this fixes things only for the default AIO implementation;
cxgbe(4)'s TCP offload has a separate pru_aio_queue implementation which
requires its own solution.
Mark Johnston [Fri, 10 Sep 2021 21:20:39 +0000 (17:20 -0400)]
socket: Add macros to lock socket buffers using socket references
Since commit c67f3b8b78e50c6df7c057d6cf108e4d6b4312d0 the sockbuf
mutexes belong to the containing socket. Sockbufs contain a pointer to
a mutex, which by default is initialized to the corresponding mutexes in
the socket. The SOCKBUF_LOCK() etc. macros operate on this pointer.
However, the pointer is clobbered by listen(2) so it's not safe to use
them unless one is sure that the socket is not a listening socket.
This change introduces a new set of macros which lock socket buffers
through the socket. This is a bit cheaper since it removes the pointer
indirection, and allows one to safely lock socket buffers and then check
for a listening socket.
For MFC, these macros should be reimplemented in terms of the existing
socket buffer layout.
Reviewed by: tuexen, gallatin, jhb
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31900
Simon J. Gerraty [Fri, 10 Sep 2021 20:11:28 +0000 (13:11 -0700)]
make: fix MAKE_JOB_ERROR_TOKEN
The rework of GetBooleanVar to GetBooleanExpr requires
we add "${" and ":U}" around the expression so it can be directly
evaluated.
Reported by: mjg
MFC after: 1 week
#
# 72 columns --|
#
# Uncomment and complete these metadata fields, as appropriate:
#
# PR: <If and which Problem Report is related.>
# Reported by: <If someone else reported the issue.>
# Reviewed by: <If someone else reviewed your modification.>
# Approved by: <If you needed approval for this commit.>
# Obtained from: <If the change is from a third party.>
# MFC after: <N [day[s]|week[s]|month[s]]. Request a reminder email>
# MFH: <Ports tree branch name. Request approval for merge.>
# Relnotes: <Set to 'yes' for mention in release notes.>
# Security: <Vulnerability reference (one per line) or description.>
# Sponsored by: <If the change was sponsored by an organization.>
# Pull Request: <https://github.com/freebsd/<repo>/pull/###>
# Differential Revision: <https://reviews.freebsd.org/D###>
#
# "Pull Request" and "Differential Revision" require the *full* GitHub or
# Phabricator URL. The commit author should be set appropriately, using
# `git commit --author` if someone besides the committer sent in the change.
#
# Uncomment and complete these metadata fields, as appropriate:
#
# PR:
# Reported by: <If someone else reported the issue.>
# Reviewed by: <If someone else reviewed your modification.>
# Approved by: <If you needed approval for this commit.>
# Obtained from: <If the change is from a third party.>
# MFC after: <N [day[s]|week[s]|month[s]]. Request a reminder email>
# MFH: <Ports tree branch name. Request approval for merge.>
# Relnotes: <Set to 'yes' for mention in release notes.>
# Security: <Vulnerability reference (one per line) or description.>
# Sponsored by: <If the change was sponsored by an organization.>
# Pull Request: <https://github.com/freebsd/<repo>/pull/###>
# Differential Revision: <https://reviews.freebsd.org/D###>
#
# "Pull Request" and "Differential Revision" require the *full* GitHub or
# Phabricator URL. The commit author should be set appropriately, using
# `git commit --author` if someone besides the committer sent in the change.
#
Gleb Smirnoff [Fri, 6 Aug 2021 23:04:31 +0000 (16:04 -0700)]
ng_l2tp: improve callout locking.
Apparently e62e4b85942 wasn't enough to close the race between
a queue being flushed by a packet and callout executing, because
the callouts used without a lock aren't 100% bulletproof. To close
the race use callout_init_mtx() for L2TP timers, and make sure that
all calls to ng_callout()/ng_uncallout() are done under the seq lock.
If used properly, a locked callout can be used transparently with
old netgraph KPI of ng_callout/ng_uncallout which predates locked
callouts.
While here, utilize ng_uncallout_drain() instead of ng_uncallout()
on the node shutdown.
Gleb Smirnoff [Fri, 6 Aug 2021 22:49:51 +0000 (15:49 -0700)]
ng_l2tp: improve seq structure locking.
Cover few cases of access to seq without lock missed in 702f98951d5.
There are no known bugs fixed with this change, however. With INVARIANTS
embed ng_l2tp_seq_check() into lock/unlock macros. Slightly reduce number
of locks/unlocks per packet keeping the lock between functions.
Gleb Smirnoff [Fri, 6 Aug 2021 20:12:13 +0000 (13:12 -0700)]
netgraph: pass return value from callout_stop() unmodified to callers of
ng_uncallout. Most of them do not check it anyway, so very little node
changes are required.
tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.
Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.
When we're synproxy-ing a connection that's going to us (as opposed to a
forwarded one) we wound up trying to send out the pf-generated tcp
packets through pf_intr(), which called ip(6)_output(). That doesn't
work all that well for packets that are destined for us, so in that case
we must call ip(6)_input() instead.
Mark Johnston [Fri, 10 Sep 2021 13:07:59 +0000 (09:07 -0400)]
ipsec: Validate the protocol identifier in ipsec4_ctlinput()
key_allocsa() expects to handle only IPSec protocols and has an
assertion to this effect. However, ipsec4_ctlinput() has to handle
messages from ICMP unreachable packets and was not validating the
protocol number. In practice such a packet would simply fail to match
any SADB entries and would thus be ignored.
Reported by: syzbot+6a9ef6fcfadb9f3877fe@syzkaller.appspotmail.com
Reviewed by: ae
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31890
Mark Johnston [Fri, 10 Sep 2021 13:07:40 +0000 (09:07 -0400)]
net: Enter a net epoch around protocol if_up/down notifications
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.
Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31889
Using /etc/jail.{jailname}.conf is nice, however it makes /etc/ very
messy if you have many jails. This patch allows one to move these
config files out of the way into /etc/jail.conf.d/{jailname}.conf.
Note that the same caveat as /etc/jail.*.conf applies: the jail service
will not autodiscover all of these for starting 'all' jails. This is
considered future work, since the behavior matches.
On case-insensitive file systems (most likely to be seen on macOS, where
it is the default), _Fork.o for the new POSIX _Fork function conflicts
with _fork.o for the PSEUDO file. This results in non-determinsitic
behaviour in terms of which ends up being present; if _Fork.o wins then
the build fails to link libc.so due to missing __sys_fork, and if
_fork.o wins then libc silently fails to include the implementation of
_Fork. A similar issue occurred in the past for C99's _Exit conflicting
with exit(2) and was fixed in cb1cb6a2a83f, so this adds a fix based on
that.
As a longer-term solution it might be better to instead make the
generated files use a different prefix that's less likely to conflict
with other things (such as __sys_foo.o given they always contain that)
but that's a rather more invasive change.
Warner Losh [Thu, 9 Sep 2021 23:11:18 +0000 (17:11 -0600)]
tabs: a hacky version of tabs appeared in 1st edition Unix
First edition Unix had an /etc/tabs file. It contained the escape
sequences to set tabs to every 8 stops on an old Teletype Model 37 and
compatible terminals. One would 'cat /etc/tabs' to reset them. Unix at
the time effectively mandated this because the delays in the tty driver
assumed this and tabs didn't work when they were too different from '8'.
Document this historical niggle in HISTORY after it was brought to my
attention on a Hacker News thread.
Currently hkbd counts all key states to be "Up" at the start of
interrupt callback. That results in generation of "Key Up" event for
each key that has been downed before but is not listed in current
report while is still downed.
Fix that with clearing of temporary key data storage bits only for
keys contained in processed report.
psm(4): Disable KVM switch "jitter" clamping for absolute touchpads.
r123442 introduced solution for clamping of PS/2 mice jitter when using
a KVM. Solution is to buffer mouse packets for 0.050ms if mouse activity
has not been seen for more than 0.5 seconds. Then flush that data to driver
if no validation errors found or drop the entire queue otherwise.
While it works well with relative devices it has issues with absolute ones
Depending on history buffering may results in delaying of the touch front
edge for 0.050ms that affects gesture processing (tap detection).
As absolute touchpads usually are built-in devices we can safely disable
bufferization and KVM jitter clamping to avoid such a delays.
- Some configurations, e.g. HP EliteBook 840 G3, come with a dummy card
in the card slot which is detected as a valid SD card. This added long
timeout at boot time. To alleviate the problem, the default timeout is
reduced to one second during the setup phase. [1]
- Some configurations crash at boot if rtsx(4) is defined in the kernel
config. At boot time, without a card inserted, the driver found that
a card is present and just after that a "spontaneous" interrupt is
generated showing that no card is present. To solve this problem,
DELAY(9) is set to one quarter of a second before checking card presence
during driver attach.
- As advised by adrian, taskqueue and DMA are set up sooner during
the driver attach. A heuristic to try to detect configuration needing
inversion was added.
The patch converting fetch to getline
(ee3ca711a898cf41330c320826ea1e0e6e451f1d),
did confuse the capacity of the line buffer with the actual len of the read
line confusing fetch -v.
Mark Johnston [Thu, 9 Sep 2021 13:50:27 +0000 (09:50 -0400)]
osd: Fix racy assertions
osd_register(9) may reallocate and expand the destructor array for a
given object type if no space is available for a new key. This happens
with the object lock held. Thus, when verifying that a given slot in
the array is occupied, we need to hold the object lock to avoid racing
with a reallocation.
Reported by: syzbot+69ce54c7d7d813315dd3@syzkaller.appspotmail.com
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Ed Maste [Thu, 9 Sep 2021 13:57:22 +0000 (09:57 -0400)]
openssh: remove update notes about upstreamed changes
Two local changes were committed upstream and are present in OpenSSH
8.7p1. Remove references from FREEBSD-upgrade now that we have updated
to that version.
Alex Richardson [Thu, 9 Sep 2021 10:46:53 +0000 (11:46 +0100)]
Export _mmap and __sys_mmap from libc.so
Unlike the other syscalls these two symbols were missing from the
version script. I noticed this while looking into the compiler-rt
runtime libraries for CHERI.
Reviewed by: brooks
Obtained from: https://github.com/CTSRD-CHERI/cheribsd/pull/1063
MFC after: 3 days
The behavior remains the same, but lualoader now uses the more concise
verbiage that forthloader used. This is particularly important because
the previous line would exceed the right boundary of the menu and run
straight into space that would typically be allowed for the logo.
This makes it slightly easier to port logos from forthloader to
lualoader.
5fcdc19a8111 didn't fully resolve the issue. There remains a report
that an ifconfig wlan0 up by itself is insufficient. Ifconfig down
must precede it.
Reported by: Filipe da Silva Santos <contact _ shiori_com_br>
Fixes: 5fcdc19a8111
MFC after: 3 days
Rick Macklem [Wed, 8 Sep 2021 21:29:20 +0000 (14:29 -0700)]
nfsd: Use the COPY_FILE_RANGE_TIMEO1SEC flag
Although it is not specified in the RFCs, the concept that
the NFSv4 server should reply to an RPC request within a
reasonable time is accepted practice within the NFSv4 community.
Without this patch, the NFSv4.2 server attempts to reply to
a Copy operation within 1 second by limiting the copy to
vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at
best, given the large variation in I/O subsystem performance.
This patch uses the COPY_FILE_RANGE_TIMEO1SEC flag added by
commit c5128c48df3c to limit the reply time for a Copy
operation to approximately 1 second.
Mark Johnston [Wed, 8 Sep 2021 03:20:21 +0000 (23:20 -0400)]
sctp: Fix a lock order reversal in sctp_swap_inpcb_for_listen()
When port reuse is enabled in a one-to-one-style socket, sctp_listen()
may call sctp_swap_inpcb_for_listen() to move the PCB out of the "TCP
pool". In so doing it will drop the PCB lock, yielding an LOR since we
now hold several socket locks. Reorder sctp_listen() so that it
performs this operation before beginning the conversion to a listening
socket. Also modify sctp_swap_inpcb_for_listen() to return with PCB
write-locked, since that's what sctp_listen() expects now.
Reviewed by: tuexen
Fixes: bd4a39cc93d9 ("socket: Properly interlock when transitioning to a listening socket")
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31879
Mark Johnston [Wed, 8 Sep 2021 03:02:15 +0000 (23:02 -0400)]
sctp: Fix lock recursion in sctp_swap_inpcb_for_listen()
After commit bd4a39cc93d9 we now hold the global inp info lock across
the call to sctp_swap_inpcb_for_listen(), which attempts to acquire it
again. Since sctp_swap_inpcb_for_listen()'s sole caller is
sctp_listen(), we can simply change it to not try to acquire the lock.
Reported by: syzbot+a76b19ea2f8e1190c451@syzkaller.appspotmail.com
Reported by: syzbot+a1b6cef257ad145b7187@syzkaller.appspotmail.com
Reviewed by: tuexen
Fixes: bd4a39cc93d9 ("socket: Properly interlock when transitioning to a listening socket")
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31878
Test prioritisation and dummynet queues.
We need to give the pipe sufficient bandwidth for dummynet to work.
Given that we can't rely on the TCP connection failing alltogether, but
we can measure the effect of dummynet by imposing a time limit on a
larger data transfer.
If TCP is prioritised it'll get most of the pipe bandwidth and easily
manage to transfer the data in 3 seconds or less. When not prioritised
this will not succeed.
Kristof Provost [Tue, 25 May 2021 14:54:32 +0000 (16:54 +0200)]
ipfw: Introduce dnctl
Introduce a link to the ipfw command, dnctl, for dummynet configuration.
dnctl only handles dummynet configuration, and is part of the effort to
support dummynet in pf.
/sbin/ipfw continues to accept pipe, queue and sched commands, but these can
now also be issued via the new dnctl command.
Because lld 13 and higher default to garbage collecting start/stop
symbols when using --gc-sections, the linker sets used in the i386 boot
loaders will disappear. This leads to the loaders not recognizing any
commands, and failure to boot.
Until we have a good set of linker scripts for the loaders, work around
it by disabling the start-stop-gc feature.
When running in a virtualized environment, TLB invalidations can only
be performed on process scope, as only the hypervisor is allowed to
invalidate a global scope, or else a Program Interrupt is triggered.
Since we are here, also make sure that the register process table
hypercall returns success.
Reviewed by: jhibbits
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D31775
Colin Percival [Tue, 7 Sep 2021 23:59:45 +0000 (16:59 -0700)]
Disable acpi_timer_test by default
This disables testing the ACPI timer by default, forcing the use of
ACPI-fast rather than ACPI-safe. The broken-ACPI-timers workaround
can be re-enabled by setting the hw.acpi.timer_test_enabled=1 tunable.
This speeds up the FreeBSD boot process by 140 ms on an EC2 c5.xlarge
instance.
This change will not be MFCed.
Assuming no problems are reported, acpi_timer_test, the associated
tunable, and the ACPI-safe timecounter should be removed in FreeBSD 15.
Relnotes: The ACPI-safe timer is disabled in favour of ACPI-fast;
if timekeeping issues are observed, please test with
hw.acpi.timer_test_enabled=1 in loader.conf and report
if that fixes the problem.
Colin Percival [Tue, 7 Sep 2021 23:58:18 +0000 (16:58 -0700)]
Hide acpi_timer_test behind a tunable
When hw.acpi.timer_test_enabled is set to 0, this makes acpi_timer_test
return 1 without actually testing the ACPI timer; this results in the
ACPI-fast timecounter always being used rather than potentially using
ACPI-safe.
The ACPI timer testing was introduced in 2002 as a workaround for
errata in Pentium II and Pentium III chipsets, and is unlikely to be
needed in 2021.
While I'm here, add TSENTER/TSEXIT to make it easier to see the time
spent on the test (if it is enabled).