imp [Wed, 20 Nov 2019 23:45:31 +0000 (23:45 +0000)]
Create /etc/os-release file.
Each boot, regenerate /var/run/os-release based on the currently running
system. Create a /etc/os-release symlink pointing to this file (so that this
doesn't create a new reason /etc can not be mounted read-only).
This is compatible with what other systems do and is what the sysutil/os-release
port attempted to do, but in an incomplete way. Linux, Solaris and DragonFly all
implement this natively as well. The complete standard can be found at
https://www.freedesktop.org/software/systemd/man/os-release.html
Moving this to the base solves both the non-standard location problem with the
port, as well as the lack of update of this file on system update.
imp [Wed, 20 Nov 2019 21:06:29 +0000 (21:06 +0000)]
Standardize EFI's ESP mount point.
Mount the UEFI ESP on /boot/efi. No current system uses this by default, but
there are many ad-hoc schemes that do this in /efi or /esp or /uefi and adding a
new directory at the top-level would have a much higher likelihood of
collision. Document this in /etc/mtree/BSD.root.mtree and create EFIDIR and
related variables in bsd.own.mk.
brooks [Wed, 20 Nov 2019 18:36:58 +0000 (18:36 +0000)]
Make the warning for deprecated NO_ variables an error.
Support for NO_CTF, NO_DEBUG_FILES, NO_INSTALLLIB, NO_MAN, NO_PROFILE,
and NO_WARNS as deprecated in 2014 with a warning added for each one
found. Turn these into error in preperation for removal of compatability
support before FreeBSD 13.
dim [Wed, 20 Nov 2019 18:12:01 +0000 (18:12 +0000)]
Add explanatory comments for the different SRCS_xxx variables used in
the Makefiles for libllvm and libclang. While here, cleanup a commented
out SRCS entry in libllvmminimal's Makefile.
emaste [Wed, 20 Nov 2019 17:57:46 +0000 (17:57 +0000)]
src.conf.5: regen for several recent changes
r354289 armv6: Switch to LLD by default
r354290 Take arm.arm (armv5) out of universe
r354348 armv6, armv7: Switch to llvm-libunwind by default
r354660 Enable the RISC-V LLVM backend by default.
emaste [Wed, 20 Nov 2019 17:37:45 +0000 (17:37 +0000)]
disable amd(8) by default
As of FreeBSD 10.1 the autofs(5) is available for automounting, and the
amd man page has indicated that the in-tree copy of amd is obsolete.
Disable it by default for now, with the expectation that it will be
removed before FreeBSD 13.0.
Reviewed by: kevans
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22460
emaste [Wed, 20 Nov 2019 16:30:37 +0000 (16:30 +0000)]
sshd: make getpwclass wrapper MON_ISAUTH not MON_AUTH
In r339216 a privsep wrapper was added for login_getpwclass to address
PR 231172. Unfortunately the change used the MON_AUTH flag in the
wrapper, and MON_AUTH includes MON_AUTHDECIDE which triggers an
auth_log() on each invocation. getpwclass() does not participate in the
authentication decision, so should be MON_ISAUTH instead.
PR: 234793
Submitted by: Henry Hu
Reviewed by: Yuichiro NAITO
MFC after: 1 week
dougm [Wed, 20 Nov 2019 16:06:48 +0000 (16:06 +0000)]
Instead of looking up a predecessor or successor to the current map
entry, when that entry has been seen already, keep the
already-looked-up value in a variable and use that instead of looking
it up again.
andrew [Wed, 20 Nov 2019 14:37:48 +0000 (14:37 +0000)]
Import the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime.
KCSAN is a tool to find concurrent memory access that may race each other.
After a determined number of memory accesses a cell is created, this
describes the current access. It will then delay for a short period
to allow other CPUs a chance to race. If another CPU performs a memory
access to an overlapping region during this delay the race is reported.
This is a straight import of the NetBSD code, it will be adapted to
FreeBSD in a future commit.
mjg [Wed, 20 Nov 2019 12:05:59 +0000 (12:05 +0000)]
vfs: change si_usecount management to count used vnodes
Currently si_usecount is effectively a sum of usecounts from all associated
vnodes. This is maintained by special-casing for VCHR every time usecount is
modified. Apart from complicating the code a little bit, it has a scalability
impact since it forces a read from a cacheline shared with said count.
There are no consumers of the feature in the ports tree. In head there are only
2: revoke and devfs_close. Both can get away with a weaker requirement than the
exact usecount, namely just the count of active vnodes. Changing the meaning to
the latter means we only need to modify it on 0<->1 transitions, avoiding the
check plenty of times (and entirely in something like vrefact).
Reviewed by: kib, jeff
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D22202
kib [Wed, 20 Nov 2019 11:12:19 +0000 (11:12 +0000)]
amd64: in double fault handler, do not rely on sane gsbase value.
Typical reasons for doublefault faults are either kernel stack
overflow or bugs in the code that manipulates protection CPU state.
The later code is the code which often has to set up gsbase for
kernel. Switching to explicit load of GSBASE MSR in the fault handler
makes it more probable to output a useful information.
Now all IST handlers have nmi_pcpu structure on top of their stacks.
It would be even more useful to save gsbase value at the moment of the
fault. I did not this because I do not want to modify PCB layout now.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kevans [Wed, 20 Nov 2019 05:04:44 +0000 (05:04 +0000)]
bcm2835_vcbus: add compatibility name for ^/sys/contrib/vchiq
It's unclear how this didn't get caught in my last iteration, but the fix is
easy- the interface is still compatible, it was just gratuituously renamed
to match my arbitrary definition of consistency... VCBUS, the BCM2835 name,
represents an address on the VideoCore CPU Bus.
In a similar fashion, while it is a physical address, the ARMC portion
represents that these are addresses as seen by the ARM CPU.
To make things even more fun, the BCM2711 peripheral documentation describes
not virtual address space vs. physical address space, but instead the 32-bit
address map vs. the address map in "Low Peripheral" mode. The latter of
these is what the *ARMC* macros translate to/from.
kevans [Wed, 20 Nov 2019 03:57:46 +0000 (03:57 +0000)]
bcm2835: push address mapping conversion for DMA/mailbox to runtime
We could maintain the static conversions for the !AArch64 Raspberry Pis, but
I'm not sure it's worth it -- we'll traverse the platform list exactly once
(of which there are only two for armv7), then every conversion there-after
traverses the memory map listing of which there are at-most two entries for
these boards: sdram and peripheral space.
Detecting this at runtime is necessary for the AArch64 SOC, though, because
of the distinct IO windows being otherwise not discernible just from support
compiled into the kernel. We currently select the correct window based on
/compatible in the FDT.
We also use a similar mechanism to describe the DMA restrictions- the RPi 4
can have up to 4GB of RAM while the DMA controller and mailbox mechanism can
technically, kind of, only access the lowest 1GB. See the comment in
bcm2835_vcbus.h for a fun description/clarification of this.
jeff [Wed, 20 Nov 2019 01:57:33 +0000 (01:57 +0000)]
When we set OFFPAGE to limit fragmentation we should also set VTOSLAB
so that we avoid the hashtables. The hashtable is now only required if
a zone is created with OFFPAGE specified initially, not internally. This
flag signals to UMA that it can't touch the allocated memory and so
can't store a slab pointer in the containing page.
jeff [Tue, 19 Nov 2019 23:19:43 +0000 (23:19 +0000)]
Simplify anonymous memory handling with an OBJ_ANON flag. This eliminates
reudundant complicated checks and additional locking required only for
anonymous memory. Introduce vm_object_allocate_anon() to create these
objects. DEFAULT and SWAP objects now have the correct settings for
non-anonymous consumers and so individual consumers need not modify the
default flags to create super-pages and avoid ONEMAPPING/NOSPLIT.
kevans [Tue, 19 Nov 2019 23:12:43 +0000 (23:12 +0000)]
bcm2835_sdhci: various refactoring of DMA path
This round of refactoring is mostly about streamlining the interrupt handler
to make it easier to verify and reason about operations taking place while
trying to bring FreeBSD up on the RPi4.
vmaffione [Tue, 19 Nov 2019 21:10:44 +0000 (21:10 +0000)]
bhyve: virtio-net: disable receive until features are negotiated
This patch fixes a race condition where the receive callback is called
while the device is being reset. Since the rx_merge variable may change
during reset, the receive callback may operate inconsistently with what
the guest expects.
Also, get rid of the unused rx_vhdrlen variable.
bz [Tue, 19 Nov 2019 21:08:18 +0000 (21:08 +0000)]
nd6: sysctl
Move the SYSCTL_DECL to the top of the file. Move the sysctl function
before SYSCTL_PROC so that we don't need an extra function declaration in
the middle of the file.
bz [Tue, 19 Nov 2019 20:34:33 +0000 (20:34 +0000)]
nd6_rtr: re-sort functions
Resort functions within file in a way that they depend on each other as
that makes it easier to rework various things.
Also allows us to remove file local function declarations.
alc [Tue, 19 Nov 2019 19:05:05 +0000 (19:05 +0000)]
Achieve two goals at once: (1) Avoid an unnecessary broadcast TLB
invalidation in reclaim_pv_chunk(). (2) Prevent an "invalid ASID" assertion
failure in reclaim_pv_chunk(). The detailed explanation for this change is
provided by r354792.
bz [Tue, 19 Nov 2019 15:38:55 +0000 (15:38 +0000)]
Reduce the vnet_set module size of ip_mroute to allow loading as a module.
With VIMAGE kernels modules get special treatment as they need
to also keep the original values and make copies for each instance.
For that a few pages of vnet modspace are provided and the
kernel-linker and the VNET framework know how to deal with things.
When the modspace is (almost) full, other modules which would
overflow the modspace cannot be loaded and kldload will fail.
ip_mroute uses a lot of variable space, mostly be four big arrays:
set_vnet 0000000000000510 vnet_entry_multicast_register_if
set_vnet 0000000000000700 vnet_entry_viftable
set_vnet 0000000000002000 vnet_entry_bw_meter_timers
set_vnet 0000000000002800 vnet_entry_bw_upcalls
Dynamically malloc the three big ones for each instance we need
and free them again on vnet teardown (the 4th is an ifnet).
That way they only need module space for a single pointer and
allow a lot more modules using virtualized variables to be loaded
on a VNET kernel.
bz [Tue, 19 Nov 2019 14:53:13 +0000 (14:53 +0000)]
mld: fix epoch assertion
in6ifa_ifpforlinklocal() asserts the net epoch. The test case from r354832
revealed code paths where we call into the function without having
acquired the net epoch first and consequently we hit the assert.
This happens in certain MLD states during VNET shutdown and most people
normaly not notice this.
For correctness acquire the net epoch around calls to
mld_v1_transmit_report() in all cases to avoid the assertion firing.
dab [Tue, 19 Nov 2019 14:46:28 +0000 (14:46 +0000)]
Don't sanitize linker_set
The assumptions of linker_set don't play nicely with
AddressSanitizer. AddressSanitizer adds a 'redzone' of zeros around
globals (including those in named sections), whereas linker_set
assumes they are all packed consecutively like a pointer array. So:
let's annotate linker_set so that AddressSanitizer ignores it.
dougm [Tue, 19 Nov 2019 08:06:31 +0000 (08:06 +0000)]
Drop the extra argument from swp_pager_meta_ctl and have it do lookup
only. Rename it swp_pager_meta_lookup. Stop checking for obj->type
== swap there and assert it instead. Make the caller responsible for
the obj->type check.
Move the meta_ctl 'pop' functionality to swap_pager_unswapped, the
only place that uses it, and assume obj->type == swap there too.
cem [Tue, 19 Nov 2019 04:30:23 +0000 (04:30 +0000)]
unifdef(1): Improve worst-case bound on symbol resolution
Use RB_TREE to make some algorithms O(lg N) and O(N lg N) instead of O(N)
and O(N^2). Because N is typically small and the former linear array also has
great constant factors (as a property of CPU caching), this doesn't provide
material benefit most or all of the time.
While here, remove arbitrarily limit on number of macros understood.
This allows easy and care-free scaling of NUM_DMA_SEGS with proper-ish
calculations to make sure we can actually handle the number of segments we'd
like to handle on average so that performance comparisons can be easily made
at different values if/once we can actually handle it. It also makes it
helps the untrained reader understand more quickly the reasoning behind the
choice of maxsize/maxsegs/maxsegsize.
jhibbits [Tue, 19 Nov 2019 02:00:13 +0000 (02:00 +0000)]
powerpc/pmap: Remove an unused error from moea64_pvo_enter()
ENOENT is leftover from mmu_oea.c's moea_pvo_enter(), where it's used to
syncicache() on the first new mapping of a page. This sync is done
differently in OEA64.
jhibbits [Tue, 19 Nov 2019 01:28:06 +0000 (01:28 +0000)]
powerpc/booke pmap: Use the right 'tlbilx' form to invalidate TIDs
'tlbilxpid' is 'tlbilx 1, 0', while the existing form is 'tlbilx 0, 0',
which translates to 'tlbilxlpid', invalidating a LDPID. This effectively
invalidates the entire TLB, causing unnecessary reloads.
kevans [Mon, 18 Nov 2019 23:31:12 +0000 (23:31 +0000)]
sysent: regenerate after r354835
The lua-based makesyscalls produces slightly different output than its
makesyscalls.sh predecessor, all whitespace differences more closely
matching the source syscalls.master.
kevans [Mon, 18 Nov 2019 23:28:23 +0000 (23:28 +0000)]
Convert in-tree sysent targets to use new makesyscalls.lua
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.
kevans [Mon, 18 Nov 2019 23:21:13 +0000 (23:21 +0000)]
Add flua to the base system, install to /usr/libexec
FreeBSDlua ("flua") is a FreeBSD-private lua, flavored with whatever
extensions we need for base system operations. We currently support a subset
of lfs and lposix that are used in the rewrite of makesyscall.sh into lua,
added in r354786.
flua is intentionally written such that one can install standard lua and
some set of lua modules from ports and achieve the same effect.
linit_flua is a copy of linit.c from contrib/lua with lfs and lposix added
in. This is similar to what we do in stand/. linit.c has been renamed to
make it clear that this has flua-specific bits.
luaconf has been slightly obfuscated to make extensions more difficult. Part
of the problem is that flua is already hard enough to use as a bootstrap
tool because it's not in PATH- attempting to do extension loading would
require a special bootstrap version of flua with paths changed to protect
the innocent.
src.lua.mk has been added to make it easy for in-tree stuff to find flua,
whether it's bootstrap-flua or relying on PATH frobbing by Makefile.inc1.
bz [Mon, 18 Nov 2019 21:59:47 +0000 (21:59 +0000)]
icmpv6: Fix mbuf change in mld
After r354748 mld_input() can change the mbuf. The new pointer
is never returned to icmp6_input() and when passed to
icmp6_rip6_input() the mbuf may no longer valid leading to
a panic.
Pass a pointer to the mbuf to mld_input() so we can return an
updated version in the non-error case.
Add a test sending an MLD packet case which will trigger this bug.
kib [Mon, 18 Nov 2019 20:56:59 +0000 (20:56 +0000)]
bus_dma_dmar_set_buswide(9): KPI to indicate that the whole dmar
context should share page tables.
Practically it means that dma requests from any device on the bus are
translated according to the entries loaded for the bus:0:0 device.
KPI requires that the slot and function of the device be 0:0, and that
no tags for other devices on the bus were used.
The intended use are NTBs which pass TLPs from the downstream to the
host with slot:func of the downstream originator.
Reviewed and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22434
markj [Mon, 18 Nov 2019 20:03:28 +0000 (20:03 +0000)]
Set MALLOC_DEBUG_MAXZONES=1 in GENERIC-NODEBUG configurations.
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory. However, it
increases fragmentation and is disabled in release kernels.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
kevans [Mon, 18 Nov 2019 19:28:09 +0000 (19:28 +0000)]
bcm2835_sdhci: use a macro for interrupts we handle
This is just further simplification, very little functional change. In the
DMA interrupt handler, we *do* now acknowledge both DATA_AVAIL | SPACE_AVAIL
every time -- these operations are mutually exclusive, so while this is a
functional change, it's effectively a nop. Removing the 'mask' local allows
us to further simplify in a future change.
kevans [Mon, 18 Nov 2019 18:40:35 +0000 (18:40 +0000)]
bcm2835_sdhci: push DATA_END handling out of DMA interrupt path
This simplifies the DMA interrupt handler quite a bit. The sdhci framework
will call platform_finish_transfer() if it's received SDHCI_INT_DATA_END, so
we can take care of any final cleanup there and simply not worry about the
possibility of it ending in the DMA interrupt path.
markj [Mon, 18 Nov 2019 18:34:23 +0000 (18:34 +0000)]
Fix inconsistencies in anonymous DOF files.
The DOF file output by dtrace -A contains only the loadable sections.
However, as it was created by a call to dtrace_dof_create() without
flags, the original DOF was created with the loadable sections. The
result is that the DOF includes the section headers for the unloadable
sections (COMMENTS and UTSNAME) without these sections actually being
present. This is inconsistent.
A simple change to anon_prog() ensures that the missing sections are
present in the outputted DOF. Alternatively, the call to
dtrace_dof_create() could pass the DTRACE_D_STRIP flag stripping out the
loadable sections. As the unloadable sections contain info useful for
debugging purposes they haven't been stripped.
markj [Mon, 18 Nov 2019 18:25:51 +0000 (18:25 +0000)]
Group per-domain reservation data in the same structure.
We currently have the per-domain partially populated reservation queues
and the per-domain queue locks. Define a new per-domain padded
structure to contain both of them. This puts the queue fields and lock
in the same cache line and avoids the false sharing within the old queue
array.
Also fix field packing in the reservation structure. In many places we
assume that a domain index fits in 8 bits, so we can do the same there
as well. This reduces the size of the structure by 8 bytes.
Update some comments while here. No functional change intended.
Reviewed by: dougm, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22391
markj [Mon, 18 Nov 2019 18:22:41 +0000 (18:22 +0000)]
Widen the vm_page aflags field to 16 bits.
We are now out of aflags bits, whereas the "flags" field only makes use
of five of its sixteen bits, so narrow "flags" to eight bits. I have no
intention of adding a new aflag in the near future, but would like to
combine the aflags, queue and act_count fields into a single atomically
updated word. This will allow vm_page_pqstate_cmpset() to become much
simpler and is a step towards eliminating the use of the page lock array
in updating per-page queue state.
The change modifies the layout of struct vm_page, so bump
__FreeBSD_version.
dab [Mon, 18 Nov 2019 13:31:16 +0000 (13:31 +0000)]
Jail and capability mode for shm_rename; add audit support for shm_rename
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail,
capability mode, and a few other things
* Adding audit support as promised.
The audit support change includes a partial refresh of OpenBSM from
upstream, where the change to add shm_rename has already been
accepted. Matthew doesn't plan to work on refreshing anything else to
support audit for those new event types.
avg [Mon, 18 Nov 2019 10:34:27 +0000 (10:34 +0000)]
fix up r354804, link zstreamdump with libzfs
Since r354804 libzpool depends on libzfs for get_system_hostid symbol.
Except for zstreamdump, all binaries linked with libzpool were already
linked with libzfs. So, zstreamdump is the only fall-out.
It's interesting that on amd64 not only I was able to successfully build
zstreamdump, I am able to run it despite having the unresolved symbol in
libzpool.
10701 Correct lock ASSERTs in vdev_label_read/write
illumos/illumos-gate@58447f688d5e308373ab16a3b129bc0ba0fbc154
https://github.com/illumos/illumos-gate/commit/58447f688d5e308373ab16a3b129bc0ba0fbc154
https://www.illumos.org/issues/10701
Port of ZoL commit: 0091d66f4e Correct lock ASSERTs in vdev_label_read/write
At a minimum, this fixes a blown assert during an MMP test run when running on
a DEBUG build.
11770 additional mmp fixes
illumos/illumos-gate@4348eb901228d2f8fa50bb132a34248e8662074e
https://github.com/illumos/illumos-gate/commit/4348eb901228d2f8fa50bb132a34248e8662074e
https://www.illumos.org/issues/11770
Port a few additional MMP fixes from ZoL that came in after our
initial MMP port. 4ca457b065 ZTS: Fix mmp_interval failure ca95f70dff zpool import progress kstat
(only minimal changes from above can be pulled in right now) 060f0226e6 MMP interval and fail_intervals in uberblock
Note from the committer (me).
I do not have any use for this feature and I have not tested it. I only
did smoke testing with multihost=off.
Please be aware.
I merged the code only to make future merges easier.
Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Tim Chase <tim@chase2k.com>
Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com>
Portions contributed by: John L. Hammond <john.hammond@intel.com>
Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov>
Portions contributed by: Prakash Surya <surya1@llnl.gov>
Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Olaf Faaland <faaland1@llnl.gov>
jhibbits [Sun, 17 Nov 2019 20:49:24 +0000 (20:49 +0000)]
powerpc: Re-add -Wno-redundant-decls to DPAA build flags
Since the DPAA code is from a third party, with minimal edits, there is no
intent to fix these specific warnings at this time. Hide these warnings to
prevent the noise from hiding real warnings.