]> CyberLeo.Net >> Repos - FreeBSD/FreeBSD.git/log
FreeBSD/FreeBSD.git
3 years agobhyve: remove a hack to map all 8G BARs 1:1
Konstantin Belousov [Thu, 12 Nov 2020 02:52:01 +0000 (02:52 +0000)]
bhyve: remove a hack to map all 8G BARs 1:1

Suggested and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27186

3 years agomlx5en: Set ifmr_current same as ifmr_active.
Konstantin Belousov [Thu, 12 Nov 2020 02:25:10 +0000 (02:25 +0000)]
mlx5en: Set ifmr_current same as ifmr_active.

This both:
- makes ifconfig media line similar to that of other drivers.
- fixes ENXIO in case when paradoxical current media word is not registered.

Now e.g.
      ifconfig mce0 -mediaopt txpause,rxpause
works by disabling pauses if enabled.

Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week

3 years agomlx5en: stop ignoring pauses and flow in the media reqs.
Konstantin Belousov [Thu, 12 Nov 2020 02:23:27 +0000 (02:23 +0000)]
mlx5en: stop ignoring pauses and flow in the media reqs.

Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week

3 years agomlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types.
Konstantin Belousov [Thu, 12 Nov 2020 02:22:16 +0000 (02:22 +0000)]
mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types.

Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week

3 years agomlx5en: Refactor repeated code to register media type to mlx5e_ifm_add().
Konstantin Belousov [Thu, 12 Nov 2020 02:21:14 +0000 (02:21 +0000)]
mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add().

Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week

3 years agocxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.
Navdeep Parhar [Thu, 12 Nov 2020 01:18:05 +0000 (01:18 +0000)]
cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs.

This should have been part of r366929.

MFC after: 3 days
Sponsored by: Chelsio Communications

3 years agobhyve: increase allowed size for 64bit BAR allocation below 4G from 32 to 128 MB.
Konstantin Belousov [Thu, 12 Nov 2020 00:51:53 +0000 (00:51 +0000)]
bhyve: increase allowed size for 64bit BAR allocation below 4G from 32 to 128 MB.

Reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27095

3 years agobhyve: avoid allocating BARs above the end of supported physical addresses.
Konstantin Belousov [Thu, 12 Nov 2020 00:46:53 +0000 (00:46 +0000)]
bhyve: avoid allocating BARs above the end of supported physical addresses.

Read CPUID leaf 0x8000008 to determine max supported phys address and
create BAR region right below it, reserving 1/4 of the supported guest
physical address space to the 64bit BARs mappings.

PR:    250802 (although the issue from PR is not fixed by the change)
Noted and reviewed by: grehan
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27095

3 years agothread: move nthread management out of tid_alloc
Mateusz Guzik [Thu, 12 Nov 2020 00:29:23 +0000 (00:29 +0000)]
thread: move nthread management out of tid_alloc

While this adds more work single-threaded, it also enables SMP-related
speed ups.

3 years agoumtx: drop incorrect timespec32 definition
Kyle Evans [Wed, 11 Nov 2020 22:35:23 +0000 (22:35 +0000)]
umtx: drop incorrect timespec32 definition

This works for amd64, but none others -- drop it, because we already have a
proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses
time32_t.

MFC after: 1 week

3 years agoMake CTL nicer to increased MAXPHYS.
Alexander Motin [Wed, 11 Nov 2020 21:59:39 +0000 (21:59 +0000)]
Make CTL nicer to increased MAXPHYS.

Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O,
that is even more overkill for MAXPHYS of 1MB.  This change limits maximum
allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds
new 128KB UMA zone for smaller I/Os.  The patch factors out alloc/free,
so later we could make it use more zones or malloc() if we'd like.

MFC after: 1 week
Sponsored by: iXsystems, Inc.

3 years agothread: batch tid_free calls in thread_reap
Mateusz Guzik [Wed, 11 Nov 2020 18:45:06 +0000 (18:45 +0000)]
thread: batch tid_free calls in thread_reap

This eliminates the highly pessimal pattern of relocking from multiple
CPUs in quick succession. Note this is still globally serialized.

3 years agothread: lockless zombie list manipulation
Mateusz Guzik [Wed, 11 Nov 2020 18:43:51 +0000 (18:43 +0000)]
thread: lockless zombie list manipulation

This gets rid of the most contended spinlock seen when creating/destroying
threads in a loop. (modulo kstack)

Tested by: alfredo (ppc64), bdragon (ppc64)

3 years agoiflib: Free full mbuf chains when draining transmit queues
Mark Johnston [Wed, 11 Nov 2020 18:00:06 +0000 (18:00 +0000)]
iflib: Free full mbuf chains when draining transmit queues

Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com>
Reviewed by: gallatin, hselasky
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27179

3 years agovm_map: Handle kernel map entry allocator recursion
Mark Johnston [Wed, 11 Nov 2020 17:16:39 +0000 (17:16 +0000)]
vm_map: Handle kernel map entry allocator recursion

On platforms without a direct map[*], vm_map_insert() may in rare
situations need to allocate a kernel map entry in order to allocate
kernel map entries.  This poses a problem similar to the one solved for
vmem boundary tags by vmem_bt_alloc().  In fact the kernel map case is a
bit more complicated since we must allocate entries with the kernel map
locked, whereas vmem can recurse into itself because boundary tags are
allocated up-front.

The solution is to add a custom slab allocator for kmapentzone which
allocates KVA directly from kernel_map, bypassing the kmem_* layer.
This avoids mutual recursion with the vmem btag allocator.  Then, when
vm_map_insert() allocates a new kernel map entry, it avoids triggering
allocation of a new slab with M_NOVM until after the insertion is
complete.  Instead, vm_map_insert() allocates from the reserve and sets
a flag in kernel_map to trigger re-population of the reserve just before
the map is unlocked.  This places an implicit upper bound on the number
of kernel map entries that may be allocated before the kernel map lock
is released, but in general a bound of 1 suffices.

[*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a
configuration required by some kernel sanitizers.

Discussed with: kib, rlibby
Reported by: andrew
Tested by: pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26851

3 years agoFix possible NULL pointer dereference.
Andrey V. Elsukov [Wed, 11 Nov 2020 15:53:36 +0000 (15:53 +0000)]
Fix possible NULL pointer dereference.

lagg(4) replaces if_output method of its child interfaces and expects
that this method can be called only by child interfaces. But it is
possible that lagg_port_output() could be called by children of child
interfaces. In this case ifnet's if_lagg field is NULL. Add check that
lp is not NULL.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC

3 years agovmm: Make pmap_invalidate_ept() wait synchronously for guest exits
Mark Johnston [Wed, 11 Nov 2020 15:01:17 +0000 (15:01 +0000)]
vmm: Make pmap_invalidate_ept() wait synchronously for guest exits

Currently EPT TLB invalidation is done by incrementing a generation
counter and issuing an IPI to all CPUs currently running vCPU threads.
The VMM inner loop caches the most recently observed generation on each
host CPU and invalidates TLB entries before executing the VM if the
cached generation number is not the most recent value.
pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing
guest instructions and reload the generation number.  However, it does
not actually wait for vCPUs to exit, potentially creating a window where
guests may continue to reference stale TLB entries.

Fix the problem by bracketing guest execution with an SMR read section
which is entered before loading the invalidation generation.  Then,
pmap_invalidate_ept() increments the current write sequence before
loading pm_active and sending IPIs, and polls readers to ensure that all
vCPUs potentially operating with stale TLB entries have exited before
pmap_invalidate_ept() returns.

Also ensure that unsynchronized loads of the generation counter are
wrapped with atomic(9), and stop (inconsistently) updating the
invalidation counter and pm_active bitmask with acquire semantics.

Reviewed by: grehan, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26910

3 years agoDocument in the synopsis that -0 cannot be used with the utility argument
Mateusz Piotrowski [Wed, 11 Nov 2020 14:53:03 +0000 (14:53 +0000)]
Document in the synopsis that -0 cannot be used with the utility argument

3 years agoRemove an extraneous parameter from SIGIO_ASSERT_LOCKED()
Mark Johnston [Wed, 11 Nov 2020 14:03:49 +0000 (14:03 +0000)]
Remove an extraneous parameter from SIGIO_ASSERT_LOCKED()

Reported by: hselasky
MFC with: r367588

3 years agoffs: Clamp BIO_SPEEDUP length
Mark Johnston [Wed, 11 Nov 2020 13:48:07 +0000 (13:48 +0000)]
ffs: Clamp BIO_SPEEDUP length

On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by
softdep_request_cleanup() may be negative when assigned to bp->b_bcount,
which has type "long".

Clamp the size to LONG_MAX.  Also convert the unused g_io_speedup() to
use an off_t for the magnitude of the shortage for consistency with
softdep_send_speedup().

Reviewed by: chs, kib
Reported by: pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27081

3 years agoFix a pair of races in SIGIO registration
Mark Johnston [Wed, 11 Nov 2020 13:44:27 +0000 (13:44 +0000)]
Fix a pair of races in SIGIO registration

First, funsetownlst() list looks at the first element of the list to see
whether it's processing a process or a process group list.  Then it
acquires the global sigio lock and processes the list.  However, nothing
prevents the first sigio tracker from being freed by a concurrent
funsetown() before the sigio lock is acquired.

Fix this by acquiring the global sigio lock immediately after checking
whether the list is empty.  Callers of funsetownlst() ensure that new
sigio trackers cannot be added concurrently.

Second, fsetown() uses funsetown() to remove an existing sigio structure
from a file object.  However, funsetown() uses a racy check to avoid the
sigio lock, so two threads may call fsetown() on the same file object,
both observe that no sigio tracker is present, and enqueue two sigio
trackers for the same file object.  However, if the file object is
destroyed, funsetown() will only remove one sigio tracker, and
funsetownlst() may later trigger a use-after-free when it clears the
file object reference for each entry in the list.

Fix this by introducing funsetown_locked(), which avoids the racy check.

Reviewed by: kib
Reported by: pho
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27157

3 years agothread: add more fine-grained tidhash locking
Mateusz Guzik [Wed, 11 Nov 2020 08:51:04 +0000 (08:51 +0000)]
thread: add more fine-grained tidhash locking

Note this still does not scale but is enough to move it out of the way
for the foreseable future.

In particular a trivial benchmark spawning/killing threads stops contesting
on tidhash.

3 years agothread: rework tidhash vs proc lock interaction
Mateusz Guzik [Wed, 11 Nov 2020 08:50:04 +0000 (08:50 +0000)]
thread: rework tidhash vs proc lock interaction

Apart from minor clean up this gets rid of proc unlock/lock cycle on thread
exit to work around LOR against tidhash lock.

3 years agothread: fix thread0 tid allocation
Mateusz Guzik [Wed, 11 Nov 2020 08:48:43 +0000 (08:48 +0000)]
thread: fix thread0 tid allocation

Startup code hardcodes the value instead of allocating it.
The first spawned thread would then be a duplicate.

Pointy hat: mjg

3 years agoAdd INIT_ALL_ZERO and INIT_ALL_PATTERN to kern.opts.mk
Warner Losh [Tue, 10 Nov 2020 23:25:16 +0000 (23:25 +0000)]
Add INIT_ALL_ZERO and INIT_ALL_PATTERN to kern.opts.mk

These options need to be in the kern.opts.mk file to be alive for kernel
and module builds. This also reverts r367579 since that's not needed with
this fix: the host's bsd.opts.mk is irrelevant.

Reviewed by: brooks@
Differential Revision:  https://reviews.freebsd.org/D27170

3 years agothread: tidy up r367543
Mateusz Guzik [Tue, 10 Nov 2020 21:29:10 +0000 (21:29 +0000)]
thread: tidy up r367543

"locked" variable is spurious in the committed version.

3 years agoBe more tolerant of share/mk and kern.mk mismatch
Brooks Davis [Tue, 10 Nov 2020 21:12:32 +0000 (21:12 +0000)]
Be more tolerant of share/mk and kern.mk mismatch

When building out-of-tree modules, it appears that the system share/mk
is used, but sys/conf/kern.mk is used.  That results in MK_INIT_ALL_ZERO
being undefined.  In the interest of maximum compatability, check
that MK_INIT_ALL_* and COMPILER_FEATURES are defined before comparing
their values.

Reported by: mmacy
Sponsored by: DARPA

3 years agoClear tp->tod in t4_pcb_detach().
John Baldwin [Tue, 10 Nov 2020 19:54:39 +0000 (19:54 +0000)]
Clear tp->tod in t4_pcb_detach().

Otherwise, a socket can have a non-NULL tp->tod while TF_TOE is clear.
In particular, if a newly accepted socket falls back to non-TOE due to
an active open failure, the non-TOE socket will still have tp->tod set
even though TF_TOE is clear.

Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D27028

3 years agoSupport initializing stack variables on function entry
Brooks Davis [Tue, 10 Nov 2020 19:15:13 +0000 (19:15 +0000)]
Support initializing stack variables on function entry

There are two options:
 - WITH_INIT_ALL_ZERO: Zero all variables on the stack.
 - WITH_INIT_ALL_PATTERN: Initialize variables with well-defined patterns.

The exact pattern are a compiler implementation detail and vary by type.
They are somewhat documented in the LLVM commit message:
https://reviews.llvm.org/rL349442
I've used WITH_INIT_ALL_* to match Microsoft's InitAll feature rather
than naming them after the LLVM specific compiler flags.

In a range of consumer products, options like these are used in
both debug and production builds with debugs builds using patterns
(intended to provoke crashes on use of uninitialized values) and
production using zeros (deemed more likely to lead to harmless
misbehavior or NULL-pointer dereferences).

Reviewed by: emaste
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27131

3 years agoAdd C startup code tests for PIE binaries.
John Baldwin [Tue, 10 Nov 2020 19:09:35 +0000 (19:09 +0000)]
Add C startup code tests for PIE binaries.

- Force dynamic to be a non-PIE binary.

- Add a dynamicpie test which uses a PIE binary.

Reviewed by: andrew
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27127

3 years agoFix dso_handle_check for PIE executables.
John Baldwin [Tue, 10 Nov 2020 19:07:30 +0000 (19:07 +0000)]
Fix dso_handle_check for PIE executables.

PIE executables use crtbeginS.o and have a non-NULL dso_handle as a
result.

Reviewed by: andrew, emaste
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27126

3 years agoRename __JCR_LIST__ to __JCR_END__ in crtend.c.
John Baldwin [Tue, 10 Nov 2020 19:04:54 +0000 (19:04 +0000)]
Rename __JCR_LIST__ to __JCR_END__ in crtend.c.

This is more consistent with the names used for .ctor and .dtor
symbols and better reflects __JCR_END__'s role.

Reviewed by: andrew
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27125

3 years agoWhen destroying a UMA zone which has a reserve (set with
Jonathan T. Looney [Tue, 10 Nov 2020 18:12:09 +0000 (18:12 +0000)]
When destroying a UMA zone which has a reserve (set with
uma_zone_reserve()), messages like the following appear on the console:
"Freed UMA keg (Test zone) was not empty (0 items). Lost 528 pages of
memory."

When keg_drain_domain() is draining the zone, it tries to keep the number
of items specified in the reservation. However, when we are destroying the
UMA zone, we do not need to keep those items. Therefore, when destroying a
non-secondary and non-cache zone, we should reset the keg reservation to 0
prior to draining the zone.

Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27129

3 years agoAllow rtprio_thread to operate on threads of any process
Mateusz Guzik [Tue, 10 Nov 2020 18:10:50 +0000 (18:10 +0000)]
Allow rtprio_thread to operate on threads of any process

This in particular unbreaks rtkit.

The limitation was a leftover of previous state, to quote a
comment:

/*
 * Though lwpid is unique, only current process is supported
 * since there is no efficient way to look up a LWP yet.
 */

Long since then a global tid hash was introduced to remedy
the problem.

Permission checks still apply.

Submitted by: greg_unrelenting.technology (Greg V)
Differential Revision: https://reviews.freebsd.org/D27158

3 years agomakeman: Don't require filemon with MK_DIRDEPS_BUILD.
Bryan Drewery [Tue, 10 Nov 2020 18:05:17 +0000 (18:05 +0000)]
makeman: Don't require filemon with MK_DIRDEPS_BUILD.

MFC after: 2 weeks
Reviewed by: sjg, dim (tested earlier version)
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27134

3 years agozfs: combine zio caches if possible
Mateusz Guzik [Tue, 10 Nov 2020 14:23:46 +0000 (14:23 +0000)]
zfs: combine zio caches if possible

This deduplicates 2 sets of caches using the same sizes.

Memory savings fluctuate a lot, one sample result is buildworld on zfs
saving ~180MB RAM in reduced page count associated with zio caches.

3 years agozfs: g/c unused data_alloc_arena
Mateusz Guzik [Tue, 10 Nov 2020 14:21:23 +0000 (14:21 +0000)]
zfs: g/c unused data_alloc_arena

3 years agoAddress a mandoc warning
Mateusz Piotrowski [Tue, 10 Nov 2020 14:17:05 +0000 (14:17 +0000)]
Address a mandoc warning

MFC after: 3 days

3 years agoInclude GID type when deleting GIDs from HW table under RoCE in mlx4ib.
Hans Petter Selasky [Tue, 10 Nov 2020 12:58:25 +0000 (12:58 +0000)]
Include GID type when deleting GIDs from HW table under RoCE in mlx4ib.
Refer to the Linux commit mentioned below for a more detailed description.

Linux commit:
a18177925c252da7801149abe217c05b80884798

Requested by: Isilon
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking

3 years agoDo not document MOTIFLIB in ports(7)
Mateusz Piotrowski [Tue, 10 Nov 2020 11:32:01 +0000 (11:32 +0000)]
Do not document MOTIFLIB in ports(7)

Perhaps it made sense in 1998 (r32836), but now it feels a bit out of
place.  We tend to avoid documenting non-essential ports variables in
the manual page (we try to document them in the Porter's Handbook instead).

MFC after: 1 week

3 years agoAdd an entry for r351863 (honoring ${name}_env in rc(8) scripts)
Mateusz Piotrowski [Tue, 10 Nov 2020 10:40:44 +0000 (10:40 +0000)]
Add an entry for r351863 (honoring ${name}_env in rc(8) scripts)

PR: 239692
Requested by: koobs

3 years agoAdd an entry to RELNOTES about renaming ACPI_DMAR to IOMMU
Mateusz Piotrowski [Tue, 10 Nov 2020 10:17:11 +0000 (10:17 +0000)]
Add an entry to RELNOTES about renaming ACPI_DMAR to IOMMU

Reviewed by: br (earlier version)
Differential Revision: https://reviews.freebsd.org/D26813

3 years agong_nat: unbreak ABI
Eugene Grosbein [Tue, 10 Nov 2020 02:26:44 +0000 (02:26 +0000)]
ng_nat: unbreak ABI

The revision r342168 broke ABI of ng_nat needlessly and
the change was merged to stable branches breaking ABI there, too.
Unbreak it.

PR: 250722
MFC after: 1 week

3 years agothread: retire thread_find
Mateusz Guzik [Tue, 10 Nov 2020 01:57:48 +0000 (01:57 +0000)]
thread: retire thread_find

tdfind should be used instead.

3 years agothread: use tdfind in sysctl_kern_proc_kstack
Mateusz Guzik [Tue, 10 Nov 2020 01:57:19 +0000 (01:57 +0000)]
thread: use tdfind in sysctl_kern_proc_kstack

This treads linear scans for locked lookup, but more importantly removes
the only consumer of thread_find.

3 years agothreads: remove the unused TID_BUFFER_SIZE macro
Mateusz Guzik [Tue, 10 Nov 2020 01:31:06 +0000 (01:31 +0000)]
threads: remove the unused TID_BUFFER_SIZE macro

3 years agothread: adds newer bits for r367537
Mateusz Guzik [Tue, 10 Nov 2020 01:13:58 +0000 (01:13 +0000)]
thread: adds newer bits for r367537

The committed patch was an older version.

3 years agousb_hub: fix whitespace
Bjoern A. Zeeb [Mon, 9 Nov 2020 23:36:51 +0000 (23:36 +0000)]
usb_hub: fix whitespace

Fix a whitespace "error" introduced in r367435 noticed when
preparing the MFC.  No functional changes.

3 years agoarm64: bs_sr_<N> take II
Bjoern A. Zeeb [Mon, 9 Nov 2020 23:34:32 +0000 (23:34 +0000)]
arm64: bs_sr_<N> take II

In r367327 generic_bs_sr_<n> were derived from mips.  Given we are calling
generic_bs_w_<n> and no write directly, we do not have to do the address
calculations ourselves as eneric_bs_w_<n> will do a str val [bsh, offset].
All we actually have to do is increment offset.

MFC after: 3 days

3 years agothreads: reimplement tid allocation on top of a bitmap
Mateusz Guzik [Mon, 9 Nov 2020 23:05:28 +0000 (23:05 +0000)]
threads: reimplement tid allocation on top of a bitmap

There are workloads with very bursty tid allocation and since unr tries very
hard to have small-sized bitmaps it keeps reallocating memory. Just doing
buildkernel gives almost 150k calls to free coming from unr.

This also gets rid of the hack which tried to postpone TID reuse.

Reviewed by: kib, markj
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27101

3 years agothreads: introduce a limit for total number
Mateusz Guzik [Mon, 9 Nov 2020 23:04:30 +0000 (23:04 +0000)]
threads: introduce a limit for total number

The intent is to replace the current id allocation method and a known upper
bound will be useful.

Reviewed by: kib (previous version), markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D27100

3 years agovfs: group mount per-cpu vars into one struct
Mateusz Guzik [Mon, 9 Nov 2020 23:02:13 +0000 (23:02 +0000)]
vfs: group mount per-cpu vars into one struct

While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by: pho

3 years agovmstat: drop the HighUse field from malloc dump
Mateusz Guzik [Mon, 9 Nov 2020 23:00:29 +0000 (23:00 +0000)]
vmstat: drop the HighUse field from malloc dump

It is hardwired to "-" since its introduction in 2005.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27141

3 years agomalloc: provide 384 byte zone
Mateusz Guzik [Mon, 9 Nov 2020 22:59:41 +0000 (22:59 +0000)]
malloc: provide 384 byte zone

Total page count after buildworld on ZFS for 384 (if present) and 512 zones:
before: 29713
after: 25946

per-zone page use:
vm.uma.malloc_384.keg.domain.1.pages: 11621
vm.uma.malloc_384.keg.domain.0.pages: 11597
vm.uma.malloc_512.keg.domain.1.pages: 1280
vm.uma.malloc_512.keg.domain.0.pages: 1448

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27145

3 years agomalloc: retire mt_stats_zone in favor of pcpu_zone_64
Mateusz Guzik [Mon, 9 Nov 2020 22:58:29 +0000 (22:58 +0000)]
malloc: retire mt_stats_zone in favor of pcpu_zone_64

Reviewed by: markj, imp
Differential Revision: https://reviews.freebsd.org/D27142

3 years agoRFC 7323 specifies that:
Michael Tuexen [Mon, 9 Nov 2020 21:49:40 +0000 (21:49 +0000)]
RFC 7323 specifies that:
* TCP segments without timestamps should be dropped when support for
  the timestamp option has been negotiated.
* TCP segments with timestamps should be processed normally if support
  for the timestamp option has not been negotiated.
This patch enforces the above.

PR: 250499
Reviewed by: gnn, rrs
MFC after: 1 week
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D27148

3 years agoBump __FreeBSD_version after linuxkpi changes
Emmanuel Vadot [Mon, 9 Nov 2020 13:20:44 +0000 (13:20 +0000)]
Bump __FreeBSD_version after linuxkpi changes

3 years agoLinuxKPI: Implement ACPI bits required by drm-kmod in base system
Emmanuel Vadot [Mon, 9 Nov 2020 13:20:14 +0000 (13:20 +0000)]
LinuxKPI: Implement ACPI bits required by drm-kmod in base system

It includes:

ACPI_HANDLE() implementation.
AC and VIDEO ACPI events notification support.
Replacement of hand-rolled GPLed _DSM method evaluation helpers
with in-base ones.

Submitted by: wulf
Differential Revision: https://reviews.freebsd.org/D26603

3 years agoFix a potential use-after-free bug introduced in
Michael Tuexen [Mon, 9 Nov 2020 13:12:07 +0000 (13:12 +0000)]
Fix a potential use-after-free bug introduced in
https://svnweb.freebsd.org/changeset/base/363046

Thanks to Taylor Brandstetter for finding this issue using fuzz testing
and reporting it in https://github.com/sctplab/usrsctp/issues/547

3 years agoMake it possible to mount a fuse filesystem, such as squashfuse,
Edward Tomasz Napierala [Mon, 9 Nov 2020 08:53:15 +0000 (08:53 +0000)]
Make it possible to mount a fuse filesystem, such as squashfuse,
from a Linux binary.  Should come handy for AppImages.

Reviewed by: asomers
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26959

3 years agoRemove newline from bxe description, it's not done elsewhere.
Warner Losh [Mon, 9 Nov 2020 03:02:34 +0000 (03:02 +0000)]
Remove newline from bxe description, it's not done elsewhere.

3 years agoAdd more per-cpu zones.
Mateusz Guzik [Mon, 9 Nov 2020 00:34:23 +0000 (00:34 +0000)]
Add more per-cpu zones.

This covers powers of 2 up to 64.

Example pending user is ZFS.

3 years agocxgbe(4): Allow the PF driver to set a VF's MAC address.
Navdeep Parhar [Mon, 9 Nov 2020 00:08:35 +0000 (00:08 +0000)]
cxgbe(4): Allow the PF driver to set a VF's MAC address.

The MAC address can be set with the optional mac-addr property in the VF
section of the iovctl.conf(5) used to instantiate the VFs.

MFC after: 2 weeks
Sponsored by: Chelsio Communications

3 years agovmstat: remove spurious newlines when reporting zones
Mateusz Guzik [Mon, 9 Nov 2020 00:05:45 +0000 (00:05 +0000)]
vmstat: remove spurious newlines when reporting zones

3 years agoprocdesc: convert the zone to a malloc type
Mateusz Guzik [Mon, 9 Nov 2020 00:05:21 +0000 (00:05 +0000)]
procdesc: convert the zone to a malloc type

The object is 128 bytes in size.

3 years agobufcache: convert bo_numoutput from long to int
Mateusz Guzik [Mon, 9 Nov 2020 00:04:58 +0000 (00:04 +0000)]
bufcache: convert bo_numoutput from long to int

int is wide enough and it plugs a hole in struct vnode, taking it down
from 496 to 488 bytes.

3 years agokqueue: save space by using only one func pointer for assertions
Mateusz Guzik [Mon, 9 Nov 2020 00:04:35 +0000 (00:04 +0000)]
kqueue: save space by using only one func pointer for assertions

3 years agocxgbev(4): Use the MAC address set by the the PF if there is one.
Navdeep Parhar [Mon, 9 Nov 2020 00:01:13 +0000 (00:01 +0000)]
cxgbev(4): Use the MAC address set by the the PF if there is one.

Query the firmware for the MAC address set by the PF for the VF and use
it instead of the firmware generated MAC if it's available.

MFC after: 2 weeks
Sponsored by: Chelsio Communications

3 years ago[PowerPC] Fix powerpc64le boot after HPT superpages addition
Brandon Bergren [Sun, 8 Nov 2020 23:34:06 +0000 (23:34 +0000)]
[PowerPC] Fix powerpc64le boot after HPT superpages addition

The HPT is always stored in big-endian, as it is accessed directly by the
hardware as well as the kernel. As such, it is necessary to convert values
to and from native endian when running on LE.

Some unconverted accesses snuck in accidentally with r367417.

Apply the appropriate conversions to fix boot hanging on powerpc64le.

Sponsored by: Tag1 Consulting, Inc.

3 years agocxgbe(4): Add the firmware binaries missing in r367428.
Navdeep Parhar [Sun, 8 Nov 2020 22:30:13 +0000 (22:30 +0000)]
cxgbe(4): Add the firmware binaries missing in r367428.

Obtained from: Chelsio Communications
MFC after: 5 days
Sponsored by: Chelsio Communications

3 years agoFix definition of rn_addmask()
Mitchell Horne [Sun, 8 Nov 2020 19:02:22 +0000 (19:02 +0000)]
Fix definition of rn_addmask()

Add the missing static keyword present in the declaration.

Reviewed by: melifaro
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27024

3 years agoigmp: convert igmpstat to use PCPU counters
Mitchell Horne [Sun, 8 Nov 2020 18:49:23 +0000 (18:49 +0000)]
igmp: convert igmpstat to use PCPU counters

Currently there is no locking done to protect this structure. It is
likely okay due to the low-volume nature of IGMP, but allows for
the possibility of underflow. This appears to be one of the only
holdouts of the conversion to counter(9) which was done for most
protocol stat structures around 2013.

This also updates the visibility of this stats structure so that it can
be consumed from elsewhere in the kernel, consistent with the vast
majority of VNET_PCPUSTAT structures.

Reviewed by: kp
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D27023

3 years agoPrevent premature SACK block transmission during loss recovery
Richard Scheffenegger [Sun, 8 Nov 2020 18:47:05 +0000 (18:47 +0000)]
Prevent premature SACK block transmission during loss recovery

Under specific conditions, a window update can be sent with
outdated SACK information. Some clients react to this by
subsequently delaying loss recovery, making TCP perform very
poorly.

Reported by: chengc_netapp.com
Reviewed by: rrs, jtl
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D24237

3 years agoSwitch net.add_addr_allfibs default to 0.
Alexander V. Chernikov [Sun, 8 Nov 2020 18:27:49 +0000 (18:27 +0000)]
Switch net.add_addr_allfibs default to 0.

The goal of the fib support is to provide multiple independent
 routing tables, isolated from each other.
net.add_addr_allfibs default tries to shift gears in the opposite
 direction, unconditionally inserting all addresses to all of the fibs.

There are use cases when this is necessary, however this is not a
 default expected behaviour, especially compared to other implementations.

Provide WARNING message for the setups with multiple fibs to notify
 potential users of the feature.

Differential Revision: https://reviews.freebsd.org/D26076

3 years agoTemporarily revert setting net.add_addr_allfibs to 0.
Alexander V. Chernikov [Sun, 8 Nov 2020 18:11:12 +0000 (18:11 +0000)]
Temporarily revert setting net.add_addr_allfibs to 0.
It accidentally sweeped in r367486.
Revert to allow for proper commit message & warning.

3 years agoMove syscall_thread_{enter,exit}() into the slow path. This is only
Edward Tomasz Napierala [Sun, 8 Nov 2020 15:54:59 +0000 (15:54 +0000)]
Move syscall_thread_{enter,exit}() into the slow path.  This is only
needed for syscalls from unloadable modules.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26988

3 years agoCheck if the ZVOL has been written before calling zil_async_to_sync.
Mariusz Zaborski [Sun, 8 Nov 2020 14:08:00 +0000 (14:08 +0000)]
Check if the ZVOL has been written before calling zil_async_to_sync.
The ZIL will be opened on the first write, not earlier.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
OpenZFS Pull Request: https://github.com/openzfs/zfs/pull/11152
PR: 250934

3 years agoFix build broken by r367484: add route_ifaddrs.c.
Alexander V. Chernikov [Sun, 8 Nov 2020 13:30:44 +0000 (13:30 +0000)]
Fix build broken by r367484: add route_ifaddrs.c.

Pointy hat to: melifaro
Reported by: jenkins

3 years agoMerge commit 354d3106c from llvm git (by Kai Luo):
Dimitry Andric [Sun, 8 Nov 2020 12:47:35 +0000 (12:47 +0000)]
Merge commit 354d3106c from llvm git (by Kai Luo):

  [PowerPC] Skip combining (uint_to_fp x) if x is not simple type

  Current powerpc64le backend hits
  ```
  Combining: t7: f64 = uint_to_fp t6
  llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291:
  llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() &&
  "Expected a SimpleValueType!"' failed.
  ```
  This patch fixes it by skipping combination if `t6` is not simple
  type.
  Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.

  Reviewed By: #powerpc, steven.zhang

  Differential Revision: https://reviews.llvm.org/D88388

This should fix the llvm assertion mentioned above when building the
following ports for powerpc64le:

* audio/traverso
* databases/percona57-pam-for-mysql
* databases/percona57-server
* emulators/citra
* emulators/citra-qt5
* games/7kaa
* graphics/dia
* graphics/mandelbulber
* graphics/pcl-pointclouds
* net-p2p/libtorrent-rasterbar
* textproc/htmldoc

Requested by: pkubaj
MFC after: 3 days

3 years agoMove all ifaddr route creation business logic to net/route/route_ifaddr.c
Alexander V. Chernikov [Sun, 8 Nov 2020 11:12:00 +0000 (11:12 +0000)]
Move all ifaddr route creation business logic to net/route/route_ifaddr.c

Differential Revision: https://reviews.freebsd.org/D26318

3 years ago - add more linux socket options (sorted by value)
Alexander Leidinger [Sun, 8 Nov 2020 09:50:58 +0000 (09:50 +0000)]
 - add more linux socket options (sorted by value)
 - map those IPv4 / IPv6 socket options which exist in FreeBSD
   + most of them visually verified to have the same type/layout of arguments
   + not tested with linux programs to behave as intended
 - be more human readable for known options which are not handled
 - be more verbose for unhandled socket message flags we know about
 - print the jail ID in linux_msg if run in a jail
 - add possibility to print debug message about known missing parts only once
 - add multiple levels of sysctl linux.debug:
   1: print debug messages, tell about unimplemented stuff (only once)
   2: like 1, but also print messages about implemented but not tested
      stuff (only once)
   3+: like 2, but no rate limiting of messages
 - increase default linux debug level from 1 to 3

We are a lot more verbose in as we need to be (e.g. some of the IP socket
options which are the same, and share the same memory layout, and are
believed to work). The reason is that we have no good testsuite to test those
linux-bits. The LTP or other test suites like the python one, are not fully
up to the task we need. As such the excessive messages about emulated but not
tested socket options.

IMO any MFC (possible, but most probably not by me) should set the default
debug level to 1.

Discussed with: trasz

3 years agoloader: cstyle cleanup of bootstrap.h did miss a bit
Toomas Soome [Sun, 8 Nov 2020 09:49:51 +0000 (09:49 +0000)]
loader: cstyle cleanup of bootstrap.h did miss a bit

correct small issues - misplaced comment and typos.

3 years agoloader: cstyle cleanup of bootstrap.h
Toomas Soome [Sun, 8 Nov 2020 09:35:41 +0000 (09:35 +0000)]
loader: cstyle cleanup of bootstrap.h

No functional changes intended.

3 years agoReturn the same value for smbios.chassis.maker as smbios.system.maker (and prevents...
Olivier Cochard [Sun, 8 Nov 2020 07:49:39 +0000 (07:49 +0000)]
Return the same value for smbios.chassis.maker as smbios.system.maker (and prevents returning a space character).

Reviewed by: grehan
Approved by: grehan
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D27123

3 years agoimgact_binmisc: limit the extent of match on incoming entries
Kyle Evans [Sun, 8 Nov 2020 04:24:29 +0000 (04:24 +0000)]
imgact_binmisc: limit the extent of match on incoming entries

imgact_binmisc matches magic/mask from imgp->image_header, which is only a
single page in size mapped from the first page of an image. One can specify
an interpreter that matches on, e.g., --offset 4096 --size 256 to read up to
256 bytes past the mapped first page.

The limitation is that we cannot specify a magic string that exceeds a
single page, and we can't allow offset + size to exceed a single page
either.  A static assert has been added in case someone finds it useful to
try and expand the size, but it does seem a little unlikely.

While this looks kind of exploitable at a sideways squinty-glance, there are
a couple of mitigating factors:

1.) imgact_binmisc is not enabled by default,
2.) entries may only be added by the superuser,
3.) trying to exploit this information to read what's mapped past the end
  would be worse than a root canal or some other relatably painful
  experience, and
4.) there's no way one could pull this off without it being completely
  obvious.

The first page is mapped out of an sf_buf, the implementation of which (or
lack thereof) depends on your platform.

MFC after: 1 week

3 years agoAdd collation version support to querylocale(3).
Thomas Munro [Sun, 8 Nov 2020 02:50:34 +0000 (02:50 +0000)]
Add collation version support to querylocale(3).

Provide a way to ask for an opaque version string for a locale_t, so
that potential changes in sort order can be detected.  Similar to
ICU's ucol_getVersion() and Windows' GetNLSVersionEx(), this API is
intended to allow databases to detect when text order-based indexes
might need to be rebuilt.

The CLDR version is extracted from CLDR source data by the Makefile
under tools/tools/locale, written into the machine-generated Makefile
under shared/colldef, passed to localedef -V, and then written into
LC_COLLATE file headers.  The initial version is 34.0.
tools/tools/locale was recently updated to pull down 35.0, but the
output hasn't been committed under share/colldef yet, so that will
provide the first observable change when it happens.  Other versioning
schemes are possible in future, because the format is unspecified.

Reviewed by: bapt, 0mp, kib, yuripv (albeit a long time ago)
Differential Revision: https://reviews.freebsd.org/D17166

3 years agoAlso mention PORTS_MODULES
Warner Losh [Sun, 8 Nov 2020 02:46:04 +0000 (02:46 +0000)]
Also mention PORTS_MODULES

PORTS_MODULES is also an effective way to update the tree. Also
a minor rejustify on this an an adjacent paragraph.

Suggested by: David Wolfskill

3 years agoBe explicit about recompiling all the modules...
Warner Losh [Sun, 8 Nov 2020 02:20:21 +0000 (02:20 +0000)]
Be explicit about recompiling all the modules...

Add a note about always recompiling all modules on every new kernel
change / update. In addition, suggest using /usr/local/sys/modules
so this happens automatically.

3 years agoUpdate to bmake-20201101
Simon J. Gerraty [Sat, 7 Nov 2020 21:46:27 +0000 (21:46 +0000)]
Update to bmake-20201101

Lots of new unit-tests increase code coverage.

Lots of refactoring, cleanup and simlpification to reduce
code size.

Fixes for Bug 223564 and 245807

Updates to dirdeps.mk and meta2deps.py

3 years agoThe ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
Michael Tuexen [Sat, 7 Nov 2020 21:17:49 +0000 (21:17 +0000)]
The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMARK
access the socket send or receive buffer. This is not possible for
listening sockets since r319722.
Because send()/recv() calls fail on listening sockets, fail also ioctl()
indicating EINVAL.

PR: 250366
Reported by: Yong-Hao Zou
Reviewed by: glebius, rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D26897

3 years agoImport bmake-20201101
Simon J. Gerraty [Sat, 7 Nov 2020 19:39:21 +0000 (19:39 +0000)]
Import bmake-20201101

Lots of new unit-tests increase code coverage.

Lots of refactoring, cleanup and simlpification to reduce
code size.

Fixes for Bug 223564 and 245807

Updates to dirdeps.mk and meta2deps.py

3 years agoFix build post-r367455.
Cy Schubert [Sat, 7 Nov 2020 19:17:37 +0000 (19:17 +0000)]
Fix build post-r367455.

MFC after: 2 weeks
X-MFC with: r367455

3 years agoimgact_binmisc: move some calculations out of the exec path
Kyle Evans [Sat, 7 Nov 2020 18:07:55 +0000 (18:07 +0000)]
imgact_binmisc: move some calculations out of the exec path

The offset we need to account for in the interpreter string comes in two
variants:

1. Fixed - macros other than #a that will not vary from invocation to
   invocation
2. Variable - #a, which is substitued with the argv0 that we're replacing

Note that we don't have a mechanism to modify an existing entry.  By
recording both of these offset requirements when the interpreter is added,
we can avoid some unnecessary calculations in the exec path.

Most importantly, we can know up-front whether we need to grab
calculate/grab the the filename for this interpreter. We also get to avoid
walking the string a first time looking for macros. For most invocations,
it's a swift exit as they won't have any, but there's no point entering a
loop and searching for the macro indicator if we already know there will not
be one.

While we're here, go ahead and only calculate the argv0 name length once per
invocation. While it's unlikely that we'll have more than one #a, there's no
reason to recalculate it every time we encounter an #a when it will not
change.

I have not bothered trying to benchmark this at all, because it's arguably a
minor and straightforward/obvious improvement.

MFC after: 1 week

3 years agosyslogd: Stop trying to send remote messages through special sockets
Bryan Drewery [Sat, 7 Nov 2020 17:18:44 +0000 (17:18 +0000)]
syslogd: Stop trying to send remote messages through special sockets

Specifically this was causing the /dev/klog fd and the signal pipe
handling fd to get a sendmsg(2) called on them and always returned
[ENOTSOCK].

r310350 combined these sockets into the main socket list and properly
skipped AF_UNSPEC at the sendmsg(2) call but later in r344739 it was
broken such that these special sockets were no longer excluded since
the AF_UNSPEC check specifically excluded these special sockets. Only
these special sockets have sl_sa = NULL. The sl_family checks should
be redundant now but are left in case of future changes so the intent
is clearer.

MFC after: 2 weeks

3 years agozfs: remove 2 assertions that teardown lock is not held
Mateusz Guzik [Sat, 7 Nov 2020 16:58:38 +0000 (16:58 +0000)]
zfs: remove 2 assertions that teardown lock is not held

They are not very useful and hard to implement with rms.

This has a side effect of simplying the code.

3 years agorms: several cleanups + debug read lockers handling
Mateusz Guzik [Sat, 7 Nov 2020 16:57:53 +0000 (16:57 +0000)]
rms: several cleanups + debug read lockers handling

This adds a dedicated counter updated with atomics when INVARIANTS
is used. As a side effect one can reliably determine the lock is held
for reading by at least one thread, but it's still not possible to
find out whether curthread has the lock in said mode.

This should be good enough in practice.

Problem spotted by avg.

3 years agoimgact_binmisc: reorder members of struct imgact_binmisc_entry (NFC)
Kyle Evans [Sat, 7 Nov 2020 16:41:59 +0000 (16:41 +0000)]
imgact_binmisc: reorder members of struct imgact_binmisc_entry (NFC)

This doesn't change anything at the moment since the out-of-order elements
were a pair of uint32_t, but future additions may have caused unnecessary
padding by following the existing precedent.

MFC after: 1 week

3 years agovt: resolve conflict between VT_ALT_TO_ESC_HACK and DBG
Kyle Evans [Sat, 7 Nov 2020 15:38:01 +0000 (15:38 +0000)]
vt: resolve conflict between VT_ALT_TO_ESC_HACK and DBG

When using the ALT+CTRL+ESC sequence to break into kdb, the keyboard is
completely borked when you return. watch(8) shows that it's working, but
it's inserting escape sequences.

Further investigation revealed that VT_ALT_TO_ESC_HACK is the default and
directly conflicts with this sequence, so upon return from the debugger
ALKED is set.

If they triggered the break to debugger, it's safe to assume they didn't
mean to use VT_ALT_TO_ESC_HACK, so just unset it to reduce the surprise when
the keyboard seems non-functional upon return.

Reviewed by: tsoome
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27109

3 years agoAdd a method to determine whether given interrupt is per CPU or not.
Michal Meloun [Sat, 7 Nov 2020 14:58:01 +0000 (14:58 +0000)]
Add a method to determine whether given interrupt is per CPU or not.

MFC after: 2 weeks

3 years agoMove TDB_USERWR check under 'if (traced)'.
Edward Tomasz Napierala [Sat, 7 Nov 2020 13:09:51 +0000 (13:09 +0000)]
Move TDB_USERWR check under 'if (traced)'.

If we hadn't been traced in the first place when syscallenter()
started executing, we can ignore TDB_USERWR.  TDB_USERWR can get set,
sure, but if it does, it's because the debugger raced with the syscall,
and it cannot depend on winning that race.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: EPSRC
Differential Revision: https://reviews.freebsd.org/D26585