https://www.illumos.org/issues/10592
This is a collection of recent fixes from ZoL: 8eef997679b Error path in metaslab_load_impl() forgets to drop ms_sync_lock 928e8ad47d3 Introduce auxiliary metaslab histograms 425d3237ee8 Get rid of space_map_update() for ms_synced_length 6c926f426a2 Simplify log vdev removal code 21e7cf5da89 zdb -L should skip leak detection altogether df72b8bebe0 Rename range_tree_verify to range_tree_verify_not_present 75058f33034 Remove unused vdev_t fields
Andrew Turner [Thu, 21 Nov 2019 11:22:08 +0000 (11:22 +0000)]
Port the NetBSD KCSAN runtime to FreeBSD.
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.
This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.
10601 Pool allocation classes
https://www.illumos.org/issues/10601
illumos port of ZoL Pool allocation classes. Includes at least these two
commits: 441709695 Pool allocation classes misplacing small file blocks cc99f275a Pool allocation classes
10757 Add -gLp to zpool subcommands for alt vdev names
https://www.illumos.org/issues/10757
Port from ZoL of d2f3e292d Add -gLp to zpool subcommands for alt vdev names
Note that a subsequent ZoL commit changed -p to -P a77f29f93 Change full path subcommand flag from -p to -P
Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: HÃ¥kan Johansson <f96hajo@chalmers.se>
Portions contributed by: Richard Yao <ryao@gentoo.org>
Portions contributed by: Chunwei Chen <david.chen@nutanix.com>
Portions contributed by: loli10K <ezomori.nozomu@gmail.com>
Author: Don Brady <don.brady@delphix.com>
11541 allocation_classes feature must be enabled to add log device
https://www.illumos.org/issues/11541
After the allocation_classes feature was integrated, one can no longer add a
log device to a pool unless that feature is enabled. There is an explicit check
for this, but it is unnecessary in the case of log devices, so we should handle
this better instead of forcing the feature to be enabled.
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
FreeBSD notes.
I faithfully added the new -g, -L, -P flags, but only -g does something:
vdev GUIDs are displayed instead of device names. -L, resolve symlinks,
and -P, display full disk paths, do nothing at the moment.
The use of special vdevs is backward compatible for read-only access, so
root pools should be bootable, but exercise caution.
Ed Maste [Thu, 21 Nov 2019 03:10:02 +0000 (03:10 +0000)]
mark arm.arm (v4/v5) kernels as NO_UNIVERSE for now
r354290 removed arm.arm from universe, but arm.arm kernels were still
found and built during the kernel stage. I'm not aware of a better way
to address this at the moment, but since there aren't many arm.arm
kernels anyhow just add an explicit NO_UNIVERSE to them.
Kyle Evans [Thu, 21 Nov 2019 02:49:41 +0000 (02:49 +0000)]
bcm2835_sdhci: clean up DMA segments in error handling path
Later parts assume that this would've been done if interrupts are enabled,
but this is the only case in which that wouldn't have been true. This commit
also reorders operations such that we're done touching slot/slot->intmask
before we call back into the SDHCI framework and exit.
Kyle Evans [Thu, 21 Nov 2019 02:47:55 +0000 (02:47 +0000)]
bcm2835_sdhci: roll back r354823
r354823 kicked DATA_END handling out of the DMA interrupt path "to make
things easy", but this was likely a mistake -- if we know we're done after
we've finished pending DMA operations, we should go ahead and acknowledge
it rather than waiting for the controller to finalize it. If it's not ready,
we'll simply re-enable interrupts and wait for it anyways, to be re-entered
in sdhci_data_intr.
Kyle Evans [Thu, 21 Nov 2019 02:41:22 +0000 (02:41 +0000)]
bcm2835_sdhci: clean up DMA segments in error handling path
Later parts assume that this would've been done if interrupts are enabled,
but this is the only case in which that wouldn't have been true. This commit
also reorders operations such that we're done touching slot/slot->intmask
before we call back into the SDHCI framework and exit.
Warner Losh [Wed, 20 Nov 2019 23:58:36 +0000 (23:58 +0000)]
Add --esp/-E argument to print the currently booted ESP
Add code to decode the BootCurrent and BootXXXX variable it points at
to deduce the ESP used to boot the system. By default, it prints the
path to that device. With --unix-path (-p) it will instead print the
current mount point for the ESP, if any (or an error). With
--device-path (-d) it wil print the UEFI device path for the ESP.
Note: This is the best guess based on the UEFI variables. If the ESP
is part of a gmirror, etc, that won't be reported. If by some weird
chance there was a complicated series of chain boots, this may not be
what you want. For setups that don't add layers on top of the raw
devices, it is accurate.
Warner Losh [Wed, 20 Nov 2019 23:45:31 +0000 (23:45 +0000)]
Create /etc/os-release file.
Each boot, regenerate /var/run/os-release based on the currently running
system. Create a /etc/os-release symlink pointing to this file (so that this
doesn't create a new reason /etc can not be mounted read-only).
This is compatible with what other systems do and is what the sysutil/os-release
port attempted to do, but in an incomplete way. Linux, Solaris and DragonFly all
implement this natively as well. The complete standard can be found at
https://www.freedesktop.org/software/systemd/man/os-release.html
Moving this to the base solves both the non-standard location problem with the
port, as well as the lack of update of this file on system update.
Warner Losh [Wed, 20 Nov 2019 21:06:29 +0000 (21:06 +0000)]
Standardize EFI's ESP mount point.
Mount the UEFI ESP on /boot/efi. No current system uses this by default, but
there are many ad-hoc schemes that do this in /efi or /esp or /uefi and adding a
new directory at the top-level would have a much higher likelihood of
collision. Document this in /etc/mtree/BSD.root.mtree and create EFIDIR and
related variables in bsd.own.mk.
Brooks Davis [Wed, 20 Nov 2019 18:36:58 +0000 (18:36 +0000)]
Make the warning for deprecated NO_ variables an error.
Support for NO_CTF, NO_DEBUG_FILES, NO_INSTALLLIB, NO_MAN, NO_PROFILE,
and NO_WARNS as deprecated in 2014 with a warning added for each one
found. Turn these into error in preperation for removal of compatability
support before FreeBSD 13.
Dimitry Andric [Wed, 20 Nov 2019 18:12:01 +0000 (18:12 +0000)]
Add explanatory comments for the different SRCS_xxx variables used in
the Makefiles for libllvm and libclang. While here, cleanup a commented
out SRCS entry in libllvmminimal's Makefile.
Ed Maste [Wed, 20 Nov 2019 17:57:46 +0000 (17:57 +0000)]
src.conf.5: regen for several recent changes
r354289 armv6: Switch to LLD by default
r354290 Take arm.arm (armv5) out of universe
r354348 armv6, armv7: Switch to llvm-libunwind by default
r354660 Enable the RISC-V LLVM backend by default.
Ed Maste [Wed, 20 Nov 2019 17:37:45 +0000 (17:37 +0000)]
disable amd(8) by default
As of FreeBSD 10.1 the autofs(5) is available for automounting, and the
amd man page has indicated that the in-tree copy of amd is obsolete.
Disable it by default for now, with the expectation that it will be
removed before FreeBSD 13.0.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22460
Ed Maste [Wed, 20 Nov 2019 16:30:37 +0000 (16:30 +0000)]
sshd: make getpwclass wrapper MON_ISAUTH not MON_AUTH
In r339216 a privsep wrapper was added for login_getpwclass to address
PR 231172. Unfortunately the change used the MON_AUTH flag in the
wrapper, and MON_AUTH includes MON_AUTHDECIDE which triggers an
auth_log() on each invocation. getpwclass() does not participate in the
authentication decision, so should be MON_ISAUTH instead.
PR: 234793
Submitted by: Henry Hu
Reviewed by: Yuichiro NAITO
MFC after: 1 week
Doug Moore [Wed, 20 Nov 2019 16:06:48 +0000 (16:06 +0000)]
Instead of looking up a predecessor or successor to the current map
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
Andrew Turner [Wed, 20 Nov 2019 14:37:48 +0000 (14:37 +0000)]
Import the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime.
KCSAN is a tool to find concurrent memory access that may race each other.
After a determined number of memory accesses a cell is created, this
describes the current access. It will then delay for a short period
to allow other CPUs a chance to race. If another CPU performs a memory
access to an overlapping region during this delay the race is reported.
This is a straight import of the NetBSD code, it will be adapted to
FreeBSD in a future commit.
Mateusz Guzik [Wed, 20 Nov 2019 12:05:59 +0000 (12:05 +0000)]
vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.
There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).
Reviewed by: kib, jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22202
amd64: in double fault handler, do not rely on sane gsbase value.
Typical reasons for doublefault faults are either kernel stack
overflow or bugs in the code that manipulates protection CPU state.
The later code is the code which often has to set up gsbase for
kernel. Switching to explicit load of GSBASE MSR in the fault handler
makes it more probable to output a useful information.
Now all IST handlers have nmi_pcpu structure on top of their stacks.
It would be even more useful to save gsbase value at the moment of the
fault. I did not this because I do not want to modify PCB layout now.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Kyle Evans [Wed, 20 Nov 2019 05:04:44 +0000 (05:04 +0000)]
bcm2835_vcbus: add compatibility name for ^/sys/contrib/vchiq
It's unclear how this didn't get caught in my last iteration, but the fix is
easy- the interface is still compatible, it was just gratuituously renamed
to match my arbitrary definition of consistency... VCBUS, the BCM2835 name,
represents an address on the VideoCore CPU Bus.
In a similar fashion, while it is a physical address, the ARMC portion
represents that these are addresses as seen by the ARM CPU.
To make things even more fun, the BCM2711 peripheral documentation describes
not virtual address space vs. physical address space, but instead the 32-bit
address map vs. the address map in "Low Peripheral" mode. The latter of
these is what the *ARMC* macros translate to/from.
Kyle Evans [Wed, 20 Nov 2019 03:57:46 +0000 (03:57 +0000)]
bcm2835: push address mapping conversion for DMA/mailbox to runtime
We could maintain the static conversions for the !AArch64 Raspberry Pis, but
I'm not sure it's worth it -- we'll traverse the platform list exactly once
(of which there are only two for armv7), then every conversion there-after
traverses the memory map listing of which there are at-most two entries for
these boards: sdram and peripheral space.
Detecting this at runtime is necessary for the AArch64 SOC, though, because
of the distinct IO windows being otherwise not discernible just from support
compiled into the kernel. We currently select the correct window based on
/compatible in the FDT.
We also use a similar mechanism to describe the DMA restrictions- the RPi 4
can have up to 4GB of RAM while the DMA controller and mailbox mechanism can
technically, kind of, only access the lowest 1GB. See the comment in
bcm2835_vcbus.h for a fun description/clarification of this.
Jeff Roberson [Wed, 20 Nov 2019 01:57:33 +0000 (01:57 +0000)]
When we set OFFPAGE to limit fragmentation we should also set VTOSLAB
so that we avoid the hashtables. The hashtable is now only required if
a zone is created with OFFPAGE specified initially, not internally. This
flag signals to UMA that it can't touch the allocated memory and so
can't store a slab pointer in the containing page.
Jeff Roberson [Tue, 19 Nov 2019 23:19:43 +0000 (23:19 +0000)]
Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.
Kyle Evans [Tue, 19 Nov 2019 23:12:43 +0000 (23:12 +0000)]
bcm2835_sdhci: various refactoring of DMA path
This round of refactoring is mostly about streamlining the interrupt handler
to make it easier to verify and reason about operations taking place while
trying to bring FreeBSD up on the RPi4.
bhyve: virtio-net: disable receive until features are negotiated
This patch fixes a race condition where the receive callback is called
while the device is being reset. Since the rx_merge variable may change
during reset, the receive callback may operate inconsistently with what
the guest expects.
Also, get rid of the unused rx_vhdrlen variable.
Bjoern A. Zeeb [Tue, 19 Nov 2019 21:08:18 +0000 (21:08 +0000)]
nd6: sysctl
Move the SYSCTL_DECL to the top of the file. Move the sysctl function
before SYSCTL_PROC so that we don't need an extra function declaration in
the middle of the file.
Bjoern A. Zeeb [Tue, 19 Nov 2019 20:34:33 +0000 (20:34 +0000)]
nd6_rtr: re-sort functions
Resort functions within file in a way that they depend on each other as
that makes it easier to rework various things.
Also allows us to remove file local function declarations.
Alan Cox [Tue, 19 Nov 2019 19:05:05 +0000 (19:05 +0000)]
Achieve two goals at once: (1) Avoid an unnecessary broadcast TLB
invalidation in reclaim_pv_chunk(). (2) Prevent an "invalid ASID" assertion
failure in reclaim_pv_chunk(). The detailed explanation for this change is
provided by r354792.
Bjoern A. Zeeb [Tue, 19 Nov 2019 15:38:55 +0000 (15:38 +0000)]
Reduce the vnet_set module size of ip_mroute to allow loading as a module.
With VIMAGE kernels modules get special treatment as they need
to also keep the original values and make copies for each instance.
For that a few pages of vnet modspace are provided and the
kernel-linker and the VNET framework know how to deal with things.
When the modspace is (almost) full, other modules which would
overflow the modspace cannot be loaded and kldload will fail.
ip_mroute uses a lot of variable space, mostly be four big arrays:
set_vnet 0000000000000510 vnet_entry_multicast_register_if
set_vnet 0000000000000700 vnet_entry_viftable
set_vnet 0000000000002000 vnet_entry_bw_meter_timers
set_vnet 0000000000002800 vnet_entry_bw_upcalls
Dynamically malloc the three big ones for each instance we need
and free them again on vnet teardown (the 4th is an ifnet).
That way they only need module space for a single pointer and
allow a lot more modules using virtualized variables to be loaded
on a VNET kernel.
Bjoern A. Zeeb [Tue, 19 Nov 2019 14:53:13 +0000 (14:53 +0000)]
mld: fix epoch assertion
in6ifa_ifpforlinklocal() asserts the net epoch. The test case from r354832
revealed code paths where we call into the function without having
acquired the net epoch first and consequently we hit the assert.
This happens in certain MLD states during VNET shutdown and most people
normaly not notice this.
For correctness acquire the net epoch around calls to
mld_v1_transmit_report() in all cases to avoid the assertion firing.
David Bright [Tue, 19 Nov 2019 14:46:28 +0000 (14:46 +0000)]
Don't sanitize linker_set
The assumptions of linker_set don't play nicely with
AddressSanitizer. AddressSanitizer adds a 'redzone' of zeros around
globals (including those in named sections), whereas linker_set
assumes they are all packed consecutively like a pointer array. So:
let's annotate linker_set so that AddressSanitizer ignores it.
Doug Moore [Tue, 19 Nov 2019 08:06:31 +0000 (08:06 +0000)]
Drop the extra argument from swp_pager_meta_ctl and have it do lookup
only. Rename it swp_pager_meta_lookup. Stop checking for obj->type
== swap there and assert it instead. Make the caller responsible for
the obj->type check.
Move the meta_ctl 'pop' functionality to swap_pager_unswapped, the
only place that uses it, and assume obj->type == swap there too.
Conrad Meyer [Tue, 19 Nov 2019 04:30:23 +0000 (04:30 +0000)]
unifdef(1): Improve worst-case bound on symbol resolution
Use RB_TREE to make some algorithms O(lg N) and O(N lg N) instead of O(N)
and O(N^2). Because N is typically small and the former linear array also has
great constant factors (as a property of CPU caching), this doesn't provide
material benefit most or all of the time.
While here, remove arbitrarily limit on number of macros understood.
This allows easy and care-free scaling of NUM_DMA_SEGS with proper-ish
calculations to make sure we can actually handle the number of segments we'd
like to handle on average so that performance comparisons can be easily made
at different values if/once we can actually handle it. It also makes it
helps the untrained reader understand more quickly the reasoning behind the
choice of maxsize/maxsegs/maxsegsize.
Justin Hibbits [Tue, 19 Nov 2019 02:00:13 +0000 (02:00 +0000)]
powerpc/pmap: Remove an unused error from moea64_pvo_enter()
ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to
syncicache() on the first new mapping of a page. This sync is done
differently in OEA64.
Justin Hibbits [Tue, 19 Nov 2019 01:28:06 +0000 (01:28 +0000)]
powerpc/booke pmap: Use the right 'tlbilx' form to invalidate TIDs
'tlbilxpid' is 'tlbilx 1, 0', while the existing form is 'tlbilx 0, 0',
which translates to 'tlbilxlpid', invalidating a LDPID. This effectively
invalidates the entire TLB, causing unnecessary reloads.
Kyle Evans [Mon, 18 Nov 2019 23:31:12 +0000 (23:31 +0000)]
sysent: regenerate after r354835
The lua-based makesyscalls produces slightly different output than its
makesyscalls.sh predecessor, all whitespace differences more closely
matching the source syscalls.master.
Kyle Evans [Mon, 18 Nov 2019 23:28:23 +0000 (23:28 +0000)]
Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.
Kyle Evans [Mon, 18 Nov 2019 23:21:13 +0000 (23:21 +0000)]
Add flua to the base system, install to /usr/libexec
FreeBSDlua ("flua") is a FreeBSD-private lua, flavored with whatever
extensions we need for base system operations. We currently support a subset
of lfs and lposix that are used in the rewrite of makesyscall.sh into lua,
added in r354786.
flua is intentionally written such that one can install standard lua and
some set of lua modules from ports and achieve the same effect.
linit_flua is a copy of linit.c from contrib/lua with lfs and lposix added
in. This is similar to what we do in stand/. linit.c has been renamed to
make it clear that this has flua-specific bits.
luaconf has been slightly obfuscated to make extensions more difficult. Part
of the problem is that flua is already hard enough to use as a bootstrap
tool because it's not in PATH- attempting to do extension loading would
require a special bootstrap version of flua with paths changed to protect
the innocent.
src.lua.mk has been added to make it easy for in-tree stuff to find flua,
whether it's bootstrap-flua or relying on PATH frobbing by Makefile.inc1.
Bjoern A. Zeeb [Mon, 18 Nov 2019 21:59:47 +0000 (21:59 +0000)]
icmpv6: Fix mbuf change in mld
After r354748 mld_input() can change the mbuf. The new pointer
is never returned to icmp6_input() and when passed to
icmp6_rip6_input() the mbuf may no longer valid leading to
a panic.
Pass a pointer to the mbuf to mld_input() so we can return an
updated version in the non-error case.
Add a test sending an MLD packet case which will trigger this bug.
bus_dma_dmar_set_buswide(9): KPI to indicate that the whole dmar
context should share page tables.
Practically it means that dma requests from any device on the bus are
translated according to the entries loaded for the bus:0:0 device.
KPI requires that the slot and function of the device be 0:0, and that
no tags for other devices on the bus were used.
The intended use are NTBs which pass TLPs from the downstream to the
host with slot:func of the downstream originator.
Reviewed and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22434
Mark Johnston [Mon, 18 Nov 2019 20:03:28 +0000 (20:03 +0000)]
Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory. However, it
increases fragmentation and is disabled in release kernels.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Kyle Evans [Mon, 18 Nov 2019 19:28:09 +0000 (19:28 +0000)]
bcm2835_sdhci: use a macro for interrupts we handle
This is just further simplification, very little functional change. In the
DMA interrupt handler, we *do* now acknowledge both DATA_AVAIL | SPACE_AVAIL
every time -- these operations are mutually exclusive, so while this is a
functional change, it's effectively a nop. Removing the 'mask' local allows
us to further simplify in a future change.