Rick Macklem [Mon, 21 Dec 2020 23:14:53 +0000 (15:14 -0800)]
Add a new "tlscertname" NFS mount option.
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake. For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.
This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided. This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
[if_dwc] add support for multi-descriptor packets in TX path
Original if_dwc driver used m_defrag as an implementation shortcut but on
1000Mb networks it affects performance. Implement multi-descriptor support for
TX path.
Tested on RK3399-Firefly, patch adds ~15% of network throughput.
Reviewed By: manu
Differential Revision: https://reviews.freebsd.org/D27520
Mitchell Horne [Wed, 23 Dec 2020 19:36:17 +0000 (15:36 -0400)]
gdb(4) fix x86 signal reporting
The existing values correspond to x86 exception vector numbers, but the
trap numbers used in the kernel do not match these 1-to-1. Prefer the
definitions from x86/trap.h, as they are what actually get passed to
kdb_trap(). This is of little consequence, as gdb_cpu_signal() only
reports the trap reason (signal number) to the gdb client.
This is limited to the subset of trap values for which kdb_trap() is
reachable.
Mitchell Horne [Wed, 23 Dec 2020 18:37:05 +0000 (14:37 -0400)]
gdb(4): allow bulk write of registers
Add support for the remote 'G' packet. This is not widely used by gdb
when 'P' is supported, but is technically required by any remote gdb
stub implementation [1].
Mitchell Horne [Wed, 23 Dec 2020 18:36:08 +0000 (14:36 -0400)]
gdb(4): handle single register read packets
We support bulk reads of the register set, but not reading specific
registers via the 'p' packet. This is useful at least for the 'call'
command in gdb.
Reviewed by: cem
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
NetApp PR: 44
Differential Revision: https://reviews.freebsd.org/D27644
Ryan Moeller [Wed, 23 Dec 2020 17:42:38 +0000 (12:42 -0500)]
sbin/sysctl: Always honor skip in sysctl_all
Fix broken CTLFLAG_SKIP when present on the first child of the requested
node.
We don't need to ignore skip for the first node because in sysctl_all()
we've implicitly visited the first node already when oid is specified.
The first call to show_var() in here is after we have iterated to the
next node. When the command line specifically requests a non-node sysctl
we go straight into show_var() without calling sysctl_all().
Mark Johnston [Wed, 23 Dec 2020 16:31:47 +0000 (11:31 -0500)]
qatfw: Fix firmware autoloading for qat_c2xxx devices
r368193 was suppsed to rename the MOF firmware image, but the
qat_c2xxxfw makefile defined the two images in the wrong order so the
MMP image was renamed instead.
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Mark Johnston [Wed, 23 Dec 2020 16:15:11 +0000 (11:15 -0500)]
rtsock: Avoid copying uninitialized padding bytes
When copying sockaddrs out to userspace, we pad them to a multiple of
the platform alignment (sizeof(long)). However, some sockaddr sizes,
such as struct sockaddr_dl, are not an integer multiple of the
alignment, so we may end up copying out uninitialized bytes.
Fix this by always bouncing through a pre-zeroed sockaddr_storage.
Reported by: KASAN
Reviewed by: melifaro
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27729
Mark Johnston [Wed, 23 Dec 2020 16:13:31 +0000 (11:13 -0500)]
md: Fix a read-after-free in BIO_GETATTR handling
g_handleattr_int() consumes the bio if the attribute matches, so when we
check bp->bio_cmd bp may have been freed.
Move GETATTR handling to a separate function to avoid the problem. We
do not need to set bio_completed for such bios, g_handleattr_int() will
handle it. Also remove the setting of bio_resid before the
devstat_end_transaction_bio() call. All of the md(4) bio handlers set
bio_resid already.
Reported by: KASAN
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27724
Mark Johnston [Wed, 23 Dec 2020 16:13:00 +0000 (11:13 -0500)]
ffs: Avoid out-of-bounds accesses in the fs_active bitmap
We use a bitmap to track which cylinder groups have changed between
snapshot creation and filesystem suspension. The "legs" of the bitmap
are four bytes wide (see ACTIVESET()) so we must round up the allocation
size to a multiple of four bytes.
I believe this bug is harmless since UMA/kmem_* will both pad the
allocation and zero the full allocation. Note that malloc() does inline
zeroing when the allocation size is known at compile-time.
Alan Somers [Mon, 21 Dec 2020 21:48:29 +0000 (21:48 +0000)]
AIO: remove the kaiocb->bio linkage
Vectored aio will require each aiocb to be associated with multiple
bios, so we can't store a link to the latter from the former. But we
don't really need to. aio_biowakeup already knows the bio it's using,
and the other fields can be stored within the bio and/or buf itself.
Ed Maste [Wed, 23 Dec 2020 13:58:17 +0000 (08:58 -0500)]
make the git commit message template more compact
git's default commit message includes the list of staged, unstaged, and
untracked files; adding our metadata tags and then their descriptions
made for a very long template.
Move the descriptions to the metadata lines themselves.
Reviewed by: bcr
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27664
Kristof Provost [Sun, 20 Dec 2020 20:06:32 +0000 (21:06 +0100)]
pf: Fix unaligned checksum updates
The algorithm we use to update checksums only works correctly if the
updated data is aligned on 16-bit boundaries (relative to the start of
the packet).
x86: stop punishing VMs with low priority for TSC timecounter
I suspect that virtualization techniques improved from the time when we
have to effectively disable TSC use in VM. For instance, it was reported
(complained) in https://github.com/JuliaLang/julia/issues/38877 that
FreeBSD is groundlessly slow on AWS with some loads.
Remove the check and start watching for complaints.
Reviewed by: emaste, grehan
Discussed with: cperciva
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27629
vxlan: stop checking CSUM_ENCAP_VXLAN when converting inner CSUM flags into normal, for decapsulation.
The packet, if processed at this point, was already parsed to be UDP
directed to a vxlan port.
Connect-X 4+ does not provide easy method to infer which parser
processed the packet, so driver cannot set the flag without a lot of
efforts which are only to satisfy the formal requirements.
Andrew Turner [Wed, 23 Dec 2020 07:24:07 +0000 (07:24 +0000)]
Improve address generation in the early arm64 boot
The adr instruction allows for an address of +-1M from the instruction.
If we replace these with an adrp and an add instruction we can generate
an address +-4G. The adrp will get an address of the 4k page the label
is within, and the add uses the :lo12: prefix to add just the low bits
to this address.
This will allow us to move things around with fewer issues than if we
needed to keep them within the +-1MB range.
Mark Johnston [Wed, 23 Dec 2020 05:11:16 +0000 (00:11 -0500)]
netgraph: Fix ng_ether's shutdown handing
When tearing down a VNET, netgraph sends shutdown messages to all of the
nodes before detaching interfaces (SI_SUB_NETGRAPH comes before
SI_SUB_INIT_IF in teardown order). ng_ether nodes handle this by
destroying themselves without detaching from the parent ifnet. Then,
when ifnets go away they detach their ng_ether nodes again, triggering a
use-after-free.
Handle this by modifying ng_ether_shutdown() to detach from the ifnet.
If the shutdown was triggered by an ifnet being destroyed, we will clear
priv->ifp in the ng_ether detach callback, so priv->ifp may be NULL.
Also get rid of the printf in vnet_netgraph_uninit(). It can be
triggered trivially by ng_ether since ng_ether_shutdown() persists the
node unless NG_REALLY_DIE is set.
PR: 233622
Reviewed by: afedorov, kp, Lutz Donnerhacke
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27662
Andrew Gallatin [Sat, 19 Dec 2020 22:04:46 +0000 (22:04 +0000)]
Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA
machine, one must avoid as many NUMA domain crossings as
possible. With SO_REUSEPORT_LB, a number of workers can share a
listen socket. However, even if a worker sets affinity to a core
or set of cores on a NUMA domain, it will receive connections
associated with all NUMA domains in the system. This will lead to
cross-domain traffic when the server writes to the socket or
calls sendfile(), and memory is allocated on the server's local
NUMA node, but transmitted on the NUMA node associated with the
TCP connection. Similarly, when the server reads from the socket,
he will likely be reading memory allocated on the NUMA domain
associated with the TCP connection.
This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A
server can now tell the kernel to filter traffic so that only
incoming connections associated with the desired NUMA domain are
given to the server. (Of course, in the case where there are no
servers sharing the listen socket on some domain, then as a
fallback, traffic will be hashed as normal to all servers sharing
the listen socket regardless of domain). This allows a server to
deal only with traffic that is local to its NUMA domain, and
avoids cross-domain traffic in most cases.
This patch, and a corresponding small patch to nginx to use
TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted
https media content from dual-socket Xeons with only 13% (as
measured by pcm.x) cross domain traffic on the memory controller.
Andrew Gallatin [Sat, 19 Dec 2020 21:46:09 +0000 (21:46 +0000)]
Optionally bind ktls threads to NUMA domains
When ktls_bind_thread is 2, we pick a ktls worker thread that is
bound to the same domain as the TCP connection associated with
the socket. We use roughly the same code as netinet/tcp_hpts.c to
do this. This allows crypto to run on the same domain as the TCP
connection is associated with. Assuming TCP_REUSPORT_LB_NUMA
(D21636) is in place & in use, this ensures that the crypto source
and destination buffers are local to the same NUMA domain as we're
running crypto on.
This change (when TCP_REUSPORT_LB_NUMA, D21636, is used) reduces
cross-domain traffic from over 37% down to about 13% as measured
by pcm.x on a dual-socket Xeon using nginx and a Netflix workload.
Gordon Bergling [Sat, 19 Dec 2020 14:54:28 +0000 (14:54 +0000)]
libc: Fix most issues reported by mandoc
- varios "new sentence, new line" warnings
- varios "sections out of conventional order" warnings
- varios "unusual Xr order" warnings
- varios "missing section argument" warnings
- varios "no blank before trailing delimiter" warnings
- varios "normalizing date format" warnings
Gordon Bergling [Sat, 19 Dec 2020 13:51:46 +0000 (13:51 +0000)]
zonectl(8): Fix a few issues reported by mandoc
- Add missing quotation mark for a comment above the .Dd
- inserting missing end of block: Sh breaks Bd
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp before Bd
- empty block: Bd
Gordon Bergling [Sat, 19 Dec 2020 13:36:59 +0000 (13:36 +0000)]
bluetooth: Fix a mandoc related issues
- new sentence, new line
- sections out of conventional order: Sh FILES
- unusual Xr order: bthost(1) after bthidd(8)
- no blank before trailing delimiter
- whitespace at end of input line
- sections out of conventional order: Sh EXIT STATUS
Gordon Bergling [Sat, 19 Dec 2020 12:47:40 +0000 (12:47 +0000)]
ipfw(8): Fix a few mandoc related issues
- no blank before trailing delimiter
- missing section argument: Xr inet_pton
- skipping paragraph macro: Pp before Ss
- unusual Xr order: syslogd after sysrc
- tab in filled text
There were a few multiline NAT examples which used the .Dl macro with
tabs. I converted them to .Bd, which is a more suitable macro for that case.
Gordon Bergling [Sat, 19 Dec 2020 11:47:38 +0000 (11:47 +0000)]
nvmecontrol(8): Fix a few mandoc related issues and add a SEE ALSO section
- inserting missing end of block: Ss breaks Bl
- skipping paragraph macro: Pp before Ss
- referenced manual not found: Xr nvme 4 (2 times)
- unknown standard specifier: St The
The macro .St can only be used for standards known by mdoc(7). So add a
SEE ALSO section and add a reference to the NVM Express Base Specification.
Gordon Bergling [Sat, 19 Dec 2020 10:15:58 +0000 (10:15 +0000)]
bhnd_erom(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp after Ss
- skipping paragraph macro: Pp at the end of Ss
- unusual Xr punctuation: none before bhnd_driver_get_erom_class(9)
- unusual Xr punctuation: none before bus_space(9)
Gordon Bergling [Sat, 19 Dec 2020 10:11:37 +0000 (10:11 +0000)]
bhnd(9): Fix a few mandoc related issues
- skipping paragraph macro: Pp before Bl
- skipping paragraph macro: Pp at the end of Ss
- missing section argument: Xr device_set_desc
- unusual Xr punctuation: none before bhnd_erom(9)
Gordon Bergling [Sat, 19 Dec 2020 09:55:02 +0000 (09:55 +0000)]
disk(9): Fix a few mandoc related errors
- function name without markup: g_io_deliver()
- function name without markup: disk_gone()
- sections out of conventional order: Sh SEE ALSO
- referenced manual not found: Xr MAKE_DEV 9
Actually the man page of MAKE_DEV has never existed.
Ryan Libby [Sat, 19 Dec 2020 08:38:31 +0000 (08:38 +0000)]
rtld-elf: link udivmoddi4 from compiler_rt
This fixes the gcc9 build of rtld-elf32 on amd64, which needed an
implementation of udivmoddi4.
rtld-elf uses certain functions normally found in libc, and so it
includes certain files from libc in its own build. It has two
mechanisms to include files from libc: one that rebuilds source files in
the rtld-elf environment, and one that extracts object files from a
purpose-built no-SSP PIC archive.
In addition to libc functions, rtld-elf may need to link functions
normally found in libcompiler_rt (formerly libgcc). Now, add an ability
to rebuild libcompiler_rt source files in the rtld-elf environment. We
don't yet have a need for an object file extraction mechanism.
libcompiler_rt could also supply udivdi3 and umoddi3, but leave them
alone for now.
Ryan Libby [Sat, 19 Dec 2020 08:38:27 +0000 (08:38 +0000)]
rtld-libc: fix incremental build
ar cr is an update of an archive, not a creation of a new one. During
incremental builds (e.g. with meta mode) the archive was not getting
cleaned, and so could retain now-deleted objects from previous builds.
Now, delete the archive before creating/updating it.
Kyle Evans [Sat, 19 Dec 2020 03:30:06 +0000 (03:30 +0000)]
kern: cpuset: allow jails to modify child jails' roots
This partially lifts a restriction imposed by r191639 ("Prevent a superuser
inside a jail from modifying the dedicated root cpuset of that jail") that's
perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still
cannot modify their own cpuset, but they can modify child jails' roots to
further restrict them or widen them back to the modifying jails' own mask.
As a side effect of this, the system root may once again widen the mask of
jails as long as they're still using a subset of the parent jails' mask.
This was previously prevented by the fact that cpuset_getroot of a root set
will return that root, rather than the root's parent -- cpuset_modify uses
cpuset_getroot since it was introduced in r327895, previously it was just
validating against set->cs_parent which allowed the system root to widen
jail masks.
Pedro F. Giffuni [Sat, 19 Dec 2020 03:07:38 +0000 (03:07 +0000)]
login(1): when exporting variables check the result of setenv(3)
When exporting a variable we correctly check all the preconditions that
could make setenv(3) fail. Checking the setenv(3) return value seems
redundant, but given that login(1) is critical, it doesn't hurt to have
a post-check.
This change is based on the "Principles of Secure Coding" course by
Matthew Bishop, PhD., which specifically discusses this code in FreeBSD.
(This change redoes r368776 due to a silly mistake)
Pedro F. Giffuni [Sat, 19 Dec 2020 02:23:53 +0000 (02:23 +0000)]
login(1): when exporting variables check the result of setenv(3)
When exporting a variable we correctly check all the preconditions that
could make setenv(3) fail. Checking the setenv(3) return value seems
redundant, but given that login(1) is critical, it doesn't hurt to have
a post-check.
This change is based on the "Principles of Secure Coding" course by
Matthew Bishop, PhD., which specifically discusses this code in FreeBSD.
Jessica Clarke [Fri, 18 Dec 2020 23:31:36 +0000 (23:31 +0000)]
usb: Replace ITUNERNET vendor with MICROCHIP and improve product names
These Mini-Box LCDs are using Microchip components and sub-licensed product
IDs. Whilst here, update the constant names and descriptions for the products
to use the names listed on the manufacturer's website rather than vague ones.
The picoLCD 4x20 is named that on the manufacturer's website so prefer that
name, even though linux-usb.org lists it with the numbers reversed as one might
expect.
Kirk McKusick [Fri, 18 Dec 2020 23:28:27 +0000 (23:28 +0000)]
Rename pass4check() to freeblock() and move from pass4.c to inode.c.
The new name more accurately describes what it does and the file move
puts it with other similar functions. Done in preparation for future
cleanups. No functional differences intended.
Sponsored by: Netflix
Historic Footnote: my last FreeBSD svn commit
Switch direct rt fields access in rtsock.c to newly-create field acessors.
rtsock code was build around the assumption that each rtentry record
in the system radix tree is a ready-to-use sockaddr. This assumptions
turned out to be not quite true:
* masks have their length tweaked, so we have rtsock_fix_netmask() hack
* IPv6 addresses have their scope embedded, so we have another explicit
deembedding hack.
Change the code to decouple rtentry internals from rtsock code using
newly-created rtentry accessors. This will allow to eventually eliminate
both of the hacks and change rtentry dst/mask format.
Mark Johnston [Fri, 18 Dec 2020 16:04:48 +0000 (16:04 +0000)]
acpi: Ensure that adjacent memory affinity table entries are coalesced
The SRAT may contain multiple distinct entries that together describe a
contiguous region of physical memory. In this case we were not
coalescing the corresponding entries in the memory affinity table, which
led to fragmented phys_avail[] entries. Since r338431 the vm_phys_segs[]
entries derived from phys_avail[] will be coalesced, resulting in a
situation where vm_phys_segs[] entries do not have a covering
phys_avail[] entry. vm_page_startup() will not add such segments to the
physical memory allocator, leaving them unused.
Reported by: Don Morris <dgmorris@earthlink.net>
Reviewed by: kib, vangyzen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27620
Jessica Clarke [Fri, 18 Dec 2020 15:07:14 +0000 (15:07 +0000)]
virtio_mmio: Fix feature negotiation copy-paste issue in r361943
This caused us to write to the low half of the feature word twice, once with
the high bits and once with the low bits. Common legacy device implementations
seem to be fairly lenient about being able to write to the feature bits
multiple times, but Arm's models use a stricter implementation that will ignore
the second write. This fixes using vtnet(4) on those models.
Marcin Wojtas [Fri, 18 Dec 2020 10:09:21 +0000 (10:09 +0000)]
Fix abort in jemalloc extent coalescing.
Fix error in extent_try_coalesce_impl(), which could cause abort
to happen when trying to coalesce extents backwards. The error could
happen because of how extent_before_get() function works. This function
gets address of previous extent, by subtracting page size from current
extent address. If current extent is located at PAGE_SIZE offset, this
address resolved to 0x0000. An assertion in rtree_leaf_elm_lookup
then caused the running program to abort.
This problem was discovered when trying to build world on 32-bit
machines with ASLR and PIE enabled. The problem was encountered
on armv7 and i386 machines, but most likely other 32-bit
architectures are affected as well.
While this patch fixes one problem with buildworld on 32-bit platforms
with ASLR, the build still fails, however it happens much later
and due to lack of memory.
The change is aligned with accepted fix in the upstream Jemalloc
repository (https://github.com/jemalloc/jemalloc/pull/1973).
As it doesn't apply on top of Jemalloc tree, its updated version
was eventually merged: https://github.com/jemalloc/jemalloc/pull/2003
pci_iov: When pci_iov_detach(9) is called, destroy VF children
instead of bailing out with EBUSY if there are any.
If driver module is unloaded, or just device is forcibly detached from
the driver, there is no way for driver to correctly unload otherwise.
Esp. if there are resources dedicated to the VFs which prevent turning
down other resources.
Peter Grehan [Fri, 18 Dec 2020 00:38:48 +0000 (00:38 +0000)]
Fix issues with various VNC clients.
- support VNC version 3.3 (macos "Screen Sharing" builtin client)
- wait until client has requested an update prior to sending framebuffer data
- don't send an update if no framebuffer updates detected
- increase framebuffer poll frequency to 30Hz, and double that when
kbd/mouse input detected
- zero uninitialized array elements in rfb_send_server_init_msg()
- fix overly large allocation in rfb_init()
- use atomics for flags shared between input and output threads
- use #defines for constants
This work was contributed by Marko Kiiskila, with reuse of some earlier
work by Henrik Gulbrandsen.