Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors
for event notification. When porting Linux software, it is currently
usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues
are not passable between processes. And, as noted by the author of
epoll-shim, even if they were, the library state would also have to be
passed somehow. This came up when debugging strange HW video decode
failures in Firefox. A native implementation would avoid these problems
and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for
the Linuxulator), it's pretty easy to expose it natively, which is what
this patch does.
Mark Johnston [Sat, 26 Dec 2020 21:07:40 +0000 (16:07 -0500)]
sendfile: Ensure that sfio->npages is initialized
We initialize sfio->npages only when some I/O is required to satisfy the
request. However, sendfile_iodone() contains an INVARIANTS-only check
that references sfio->npages, and this check is executed even if no I/O
is performed, so the check may use an uninitialized value.
Fix the problem by initializing sfio->npages earlier. Note that
sendfile_swapin() always initializes the page array. In some rare cases
we need to trim the page array so ensure that sfio->npages gets updated
accordingly.
Reported by: syzkaller (with KASAN)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27726
Andrew Turner [Wed, 23 Dec 2020 18:56:09 +0000 (18:56 +0000)]
Use the base address for early arm64 page tables
Use the kernel physical base rather than the ttbr0 base when building
the kernel identity map. The latter is correct with current assumptions
but may not always be the case.
Marius Strobl [Sat, 26 Dec 2020 18:07:50 +0000 (19:07 +0100)]
mlphy(4)/tlphy(4): Remove obsolete drivers
These drivers should have been removed along with tl(4) as part of 7c897ca91fe1cdb785531d2f5aa0d441c1d73142 and r347918 respectively
as these fromer made sure to only ever attach to the latter, e. g.:
<...>
static int
tlphy_probe(device_t dev)
{
if (!mii_dev_mac_match(dev, "tl"))
return (ENXIO);
<...>
Jamie Gritton [Sat, 26 Dec 2020 18:39:34 +0000 (10:39 -0800)]
jail: Fix an O(n^2) loop when adding jails
When a jail is added using the default (system-chosen) JID, and
non-default-JID jails already exist, a loop through the allprison
list could restart and result in unnecessary O(n^2) behaviour.
There should never be more than two list passes required.
Also clean up inefficient (though still O(n)) allprison list traversal
when finding jails by ID, or when adding jails in the common case of
all default JIDs.
Ed Maste [Sat, 26 Dec 2020 16:43:44 +0000 (11:43 -0500)]
gprof: Retire a.out support
FreeBSD has used ELF binaries/libraries for decades, but still has some
support for legacy a.out binaries. Portions of this have been retired
over time, but support remained in ldd, ldconfig, and gprof.
Retire gprof support; if anyone needs to do development on a.out
binaries still they will be best served by installing a full FreeBSD 2.x
or other obsolete version in a jail.
Kernel support for executing a.out binaries is unchnaged.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27480
Ulrich Spörlein [Wed, 23 Dec 2020 21:29:34 +0000 (22:29 +0100)]
Fix newvers.sh to no longer print an outdated SVN rev
We have stopped using SVN, so the notes containing the old SVN revisions
are no longer populated, so fall back to purely counting the number of
commits (currently at about 255337).
Also turn the format more into what git-describe produces, with a name
first, then the number of commits and the hash last. Note that as we
don't tag anything on `main`, git describe will never produce something
useful there and finds the newest vendor tag that was merged in instead.
Use light-weight versions of routing lookup functions in ng_netflow.
Use recently-added combination of `fib[46]_lookup_rt()` which
returns rtentry & raw nexthop with `rt_get_inet[6]_plen()` which
returns address/prefix length of prefix inside `rt`.
Add `nhop_select_func()` wrapper around inlined `nhop_select()` to
allow callers external to the routing subsystem select the proper
nexthop from the multipath group without including internal headers.
New calls does not require reference counting objects and reduce
the amount of copied/processed rtentry data.
Kyle Evans [Tue, 22 Dec 2020 21:36:40 +0000 (15:36 -0600)]
build: remove the option to build gnugrep
Unconditionally install bsdgrep as grep, bootstrap or not. Remove all
build glue and stop installing both gnugrep and libgnuregex now that
all consumers of the latter are gone.
Add tcgetwinsize(3) and tcsetwinsize(3) to termios
These functions get/set tty winsize respectively, and are trivial wrappers
around corresponding termio ioctls.
The functions are expected to be a part of POSIX.1 issue 8:
https://www.austingroupbugs.net/view.php?id=1151#c3856.
They are currently available in NetBSD and in musl libc.
Changes from 1.4.5:
* https://github.com/facebook/zstd/releases/tag/v1.4.8
* https://github.com/facebook/zstd/releases/tag/v1.4.7
(and there was no public v1.4.6)
Conflicts:
sys/contrib/zstd/lib/common/zstd_internal.h (new ZSTD_NO_INTRINSICS)
This change introduces framework that allows to dynamically
attach or detach longest prefix match (lpm) lookup algorithms
to speed up datapath route tables lookups.
Framework takes care of handling initial synchronisation,
route subscription, nhop/nhop groups reference and indexing,
dataplane attachments and fib instance algorithm setup/teardown.
Framework features automatic algorithm selection, allowing for
picking the best matching algorithm on-the-fly based on the
amount of routes in the routing table.
Currently framework code is guarded under FIB_ALGO config option.
An idea is to enable it by default in the next couple of weeks.
The following algorithms are provided by default:
IPv4:
* bsearch4 (lockless binary search in a special IP array), tailored for
small-fib (<16 routes)
* radix4_lockless (lockless immutable radix, re-created on every rtable change),
tailored for small-fib (<1000 routes)
* radix4 (base system radix backend)
* dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized
for large-fib (D27412)
IPv6:
* radix6_lockless (lockless immutable radix, re-created on every rtable change),
tailed for small-fib (<1000 routes)
* radix6 (base system radix backend)
* dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized
for large-fib (D27412)
Rick Macklem [Thu, 24 Dec 2020 22:20:06 +0000 (14:20 -0800)]
mount_nfs(8): add a description for the new "tlscertname" option
commit 665b1365fe8e added a new NFS mount option that is used to set a
non-default X.509 certificate, that can be used for nfs-over-tls NFS
mounts.
This patch adds a description for it to the man page.
ukbd(4): Push LED events in ioctl handler rather than in xfer callback
If LED state is set through evdev interface, than asynchronous nature
of USB transfer callback can lead to change of order of events echoed
back to userland as it causes LED events to be echoed with some lag.
Fix that with echoing of LED events synchronously in ioctl handler.
Unlike AT keyboards, HID devices are able to send all pc105 key
states within a single report. Let evdev to transmit all key state
changes within a single report too.
wmt(4): Add support for hardware timestamp reporting
Hardware timestamp reporting is disabled by default as it produces many
extra events which are not handled by consumers like libinput.
Add hw.usb.wmt.timestamps=1 tunable to loader.conf to enable it.
In Hybrid mode, the number of contacts that can be reported in one
report is less than the maximum number of contacts that the device
supports. For example, a device that supports a maximum of 4
concurrent physical contacts, can set up its top-level collection to
deliver a maximum of two contacts in one report. If four contact
points are present, the device can break these up into two serial
reports that deliver two contacts each.
Rick Macklem [Mon, 21 Dec 2020 23:14:53 +0000 (15:14 -0800)]
Add a new "tlscertname" NFS mount option.
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake. For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.
This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided. This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
[if_dwc] add support for multi-descriptor packets in TX path
Original if_dwc driver used m_defrag as an implementation shortcut but on
1000Mb networks it affects performance. Implement multi-descriptor support for
TX path.
Tested on RK3399-Firefly, patch adds ~15% of network throughput.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D27520
Mitchell Horne [Wed, 23 Dec 2020 19:36:17 +0000 (15:36 -0400)]
gdb(4) fix x86 signal reporting
The existing values correspond to x86 exception vector numbers, but the
trap numbers used in the kernel do not match these 1-to-1. Prefer the
definitions from x86/trap.h, as they are what actually get passed to
kdb_trap(). This is of little consequence, as gdb_cpu_signal() only
reports the trap reason (signal number) to the gdb client.
This is limited to the subset of trap values for which kdb_trap() is
reachable.
Mitchell Horne [Wed, 23 Dec 2020 18:37:05 +0000 (14:37 -0400)]
gdb(4): allow bulk write of registers
Add support for the remote 'G' packet. This is not widely used by gdb
when 'P' is supported, but is technically required by any remote gdb
stub implementation [1].
Mitchell Horne [Wed, 23 Dec 2020 18:36:08 +0000 (14:36 -0400)]
gdb(4): handle single register read packets
We support bulk reads of the register set, but not reading specific
registers via the 'p' packet. This is useful at least for the 'call'
command in gdb.
Reviewed by: cem
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 44
Differential Revision: https://reviews.freebsd.org/D27644
Ryan Moeller [Wed, 23 Dec 2020 17:42:38 +0000 (12:42 -0500)]
sbin/sysctl: Always honor skip in sysctl_all
Fix broken CTLFLAG_SKIP when present on the first child of the requested
node.
We don't need to ignore skip for the first node because in sysctl_all()
we've implicitly visited the first node already when oid is specified.
The first call to show_var() in here is after we have iterated to the
next node. When the command line specifically requests a non-node sysctl
we go straight into show_var() without calling sysctl_all().
Mark Johnston [Wed, 23 Dec 2020 16:31:47 +0000 (11:31 -0500)]
qatfw: Fix firmware autoloading for qat_c2xxx devices
r368193 was suppsed to rename the MOF firmware image, but the
qat_c2xxxfw makefile defined the two images in the wrong order so the
MMP image was renamed instead.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Mark Johnston [Wed, 23 Dec 2020 16:15:11 +0000 (11:15 -0500)]
rtsock: Avoid copying uninitialized padding bytes
When copying sockaddrs out to userspace, we pad them to a multiple of
the platform alignment (sizeof(long)). However, some sockaddr sizes,
such as struct sockaddr_dl, are not an integer multiple of the
alignment, so we may end up copying out uninitialized bytes.
Fix this by always bouncing through a pre-zeroed sockaddr_storage.
Reported by: KASAN
Reviewed by: melifaro
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27729
Mark Johnston [Wed, 23 Dec 2020 16:13:31 +0000 (11:13 -0500)]
md: Fix a read-after-free in BIO_GETATTR handling
g_handleattr_int() consumes the bio if the attribute matches, so when we
check bp->bio_cmd bp may have been freed.
Move GETATTR handling to a separate function to avoid the problem. We
do not need to set bio_completed for such bios, g_handleattr_int() will
handle it. Also remove the setting of bio_resid before the
devstat_end_transaction_bio() call. All of the md(4) bio handlers set
bio_resid already.
Reported by: KASAN
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27724
Mark Johnston [Wed, 23 Dec 2020 16:13:00 +0000 (11:13 -0500)]
ffs: Avoid out-of-bounds accesses in the fs_active bitmap
We use a bitmap to track which cylinder groups have changed between
snapshot creation and filesystem suspension. The "legs" of the bitmap
are four bytes wide (see ACTIVESET()) so we must round up the allocation
size to a multiple of four bytes.
I believe this bug is harmless since UMA/kmem_* will both pad the
allocation and zero the full allocation. Note that malloc() does inline
zeroing when the allocation size is known at compile-time.
Alan Somers [Mon, 21 Dec 2020 21:48:29 +0000 (21:48 +0000)]
AIO: remove the kaiocb->bio linkage
Vectored aio will require each aiocb to be associated with multiple
bios, so we can't store a link to the latter from the former. But we
don't really need to. aio_biowakeup already knows the bio it's using,
and the other fields can be stored within the bio and/or buf itself.
Ed Maste [Wed, 23 Dec 2020 13:58:17 +0000 (08:58 -0500)]
make the git commit message template more compact
git's default commit message includes the list of staged, unstaged, and
untracked files; adding our metadata tags and then their descriptions
made for a very long template.
Move the descriptions to the metadata lines themselves.
Reviewed by: bcr
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27664