10701 Correct lock ASSERTs in vdev_label_read/write
illumos/illumos-gate@58447f688d5e308373ab16a3b129bc0ba0fbc154
https://github.com/illumos/illumos-gate/commit/58447f688d5e308373ab16a3b129bc0ba0fbc154
https://www.illumos.org/issues/10701
Port of ZoL commit: 0091d66f4e Correct lock ASSERTs in vdev_label_read/write
At a minimum, this fixes a blown assert during an MMP test run when running on
a DEBUG build.
11770 additional mmp fixes
illumos/illumos-gate@4348eb901228d2f8fa50bb132a34248e8662074e
https://github.com/illumos/illumos-gate/commit/4348eb901228d2f8fa50bb132a34248e8662074e
https://www.illumos.org/issues/11770
Port a few additional MMP fixes from ZoL that came in after our
initial MMP port. 4ca457b065 ZTS: Fix mmp_interval failure ca95f70dff zpool import progress kstat
(only minimal changes from above can be pulled in right now) 060f0226e6 MMP interval and fail_intervals in uberblock
Note from the committer (me).
I do not have any use for this feature and I have not tested it. I only
did smoke testing with multihost=off.
Please be aware.
I merged the code only to make future merges easier.
Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Portions contributed by: Tim Chase <tim@chase2k.com>
Portions contributed by: sanjeevbagewadi <sanjeev.bagewadi@gmail.com>
Portions contributed by: John L. Hammond <john.hammond@intel.com>
Portions contributed by: Giuseppe Di Natale <dinatale2@llnl.gov>
Portions contributed by: Prakash Surya <surya1@llnl.gov>
Portions contributed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Olaf Faaland <faaland1@llnl.gov>
jhibbits [Sun, 17 Nov 2019 20:49:24 +0000 (20:49 +0000)]
powerpc: Re-add -Wno-redundant-decls to DPAA build flags
Since the DPAA code is from a third party, with minimal edits, there is no
intent to fix these specific warnings at this time. Hide these warnings to
prevent the noise from hiding real warnings.
alc [Sun, 17 Nov 2019 17:38:53 +0000 (17:38 +0000)]
Achieve two goals at once: (1) Avoid an unnecessary broadcast TLB
invalidation in pmap_remove_all(). (2) Prevent an "invalid ASID" assertion
failure in pmap_remove_all().
The architecture definition specifies that the TLB will not cache mappings
that don't have the "AF" bit set, so pmap_remove_all() needn't issue a TLB
invalidation for mappings that don't have the "AF" bit set.
We allocate ASIDs lazily. Specifically, we don't allocate an ASID for a
pmap until we are activating it. Now, consider what happens on a fork().
Before we activate the child's pmap, we use pmap_copy() to copy mappings
from the parent's pmap to the child's. These new mappings have their "AF"
bits cleared. Suppose that the page daemon decides to reclaim a page that
underlies one of these new mappings. Previously, the pmap_invalidate_page()
performed by pmap_remove_all() on a mapping in the child's pmap would fail
an assertion because that pmap hasn't yet been assigned an ASID. However,
we don't need to issue a TLB invalidation for such mappings because they
can't possibly be cached in the TLB.
kevans [Sun, 17 Nov 2019 14:08:19 +0000 (14:08 +0000)]
Add makesyscalls.lua, a rewrite of makesyscalls.sh
This currently requires a suitable lua + luafilesystem + luaposix from the
ports tree to build. Discussion is underway in D21893 to add a suitable lua
to the base system, cleverly disguised and out of the way of normal
consumers.
makesyscalls.sh is a good target for rewrite into lua as it's currently a
sh+sed+awk script that can be difficult to add on to, at times. For
instance, adding a new COMPAT* option (that mimicks the behaivor of most
other COMPAT* options) requires a fairly substantial amount of copy/paste;
see r352693 for instance. Attempts to generate part of the awk script for
COMPAT* handling was (very kindly) rejected with a desire to just rewrite
the script in a single language that can handle all of it.
jhibbits [Sat, 16 Nov 2019 16:36:20 +0000 (16:36 +0000)]
powerpcspe: Don't leak kernel registers in SPE dumps
save_vec_int() for SPE saves off only the high word of the register, leaving
the low word as "garbage", but really containing whatever was in the kernel
register at the time. This leaks into core dumps, and in a near future
commit also into ptrace. Instead, save the GPR in the low word in
save_vec_nodrop(), which is used only for core dumps and ptrace.
jhibbits [Sat, 16 Nov 2019 16:27:31 +0000 (16:27 +0000)]
powerpcspe: Mark asm statement in spe_save_reg_high as clobbering memory
Modern gcc errors that "'vec[0]' is used uninitialized in this function"
without us telling it that vec is clobbered. Neither clang nor gcc 4.2.1
error on the existing construct.
tuexen [Sat, 16 Nov 2019 11:57:12 +0000 (11:57 +0000)]
Improve TCP CUBIC specific after idle reaction.
The adjustments are inspired by the Linux stack, which has had a
functionally equivalent implementation for more than a decade now.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18982
tuexen [Sat, 16 Nov 2019 11:37:26 +0000 (11:37 +0000)]
Implement a tCP CUBIC-specific after idle reaction.
This patch addresses a very common case of frequent application stalls,
where TCP runs idle and looses the state of the network.
Submitted by: Richard Scheffenegger
Reviewed by: Cheng Cui
Differential Revision: https://reviews.freebsd.org/D18954
scottl [Sat, 16 Nov 2019 00:26:42 +0000 (00:26 +0000)]
TSX Asynchronous Abort mitigation for Intel CVE-2019-11135.
This CVE has already been announced in FreeBSD SA-19:26.mcu.
Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.
Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.
machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable
Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.
bz [Sat, 16 Nov 2019 00:17:35 +0000 (00:17 +0000)]
nd6: retire defrouter_select(), use _fib() variant.
Burn bridges and replace the last two calls of defrouter_select() with
defrouter_select_fib(). That allows us to retire defrouter_select()
and make it more clear in the calling code that it applies to all FIBs.
bz [Sat, 16 Nov 2019 00:02:36 +0000 (00:02 +0000)]
nd6_rtr:
Pull in the TAILQ_HEAD() as it is not needed outside nd6_rtr.c.
Rename the TAILQ_HEAD() struct and the nd_defrouter variable from
"nd_" to "nd6_" as they are not part of the RFC 3542 API which uses "ND_".
Ideally I'd like to also rename the struct nd_defrouter {} to "nd6_*"
but given that is used externally there is more work to do.
scottl [Fri, 15 Nov 2019 23:27:17 +0000 (23:27 +0000)]
Create a new sysctl subtree, machdep.mitigations. Its purpose is to organize
knobs and indicators for code that mitigates functional and security issues
in the architecture/platform. Controls for regular operational policy should
still go into places security, hw, kern, etc.
The machdep root node is inherently architecture dependent, but mitigations
tend to be architecture dependent as well. Some cases like Spectre do cross
architectural boundaries, but the mitigation code for them tends to be
architecture dependent anyways, and multiple architectures won't be active
in the same image of the kernel.
Many mitigation knobs already exist in the system, and they will be moved
with compat naming in the future. Going forward, mitigations should collect
in machdep.mitigations.
bz [Fri, 15 Nov 2019 23:12:19 +0000 (23:12 +0000)]
if_llatbl: change htable_unlink_entry() to early exist if no work to do
Adjust the logic in htable_unlink_entry() to the one in
htable_link_entry() saving a block indent and making it more clear
in which case we do not do any work.
mav [Fri, 15 Nov 2019 23:01:09 +0000 (23:01 +0000)]
Initialize *comp_update with valid value.
I've noticed that sometimes with enabled DMAR initial write from device
to this address is somehow getting delayed, triggering assertion due to
zero default being invalid.
mav [Fri, 15 Nov 2019 22:47:59 +0000 (22:47 +0000)]
Cleanup address range checks in ioat(4).
- Deduce allowed address range for bus_dma(9) from the hardware version.
Different versions (CPU generations) have different documented limits.
- Remove difference between address ranges for src/dst and crc. At least
docs for few recent generations of CPUs do not mention anything like that,
while older are already limited with above limits.
- Remove address assertions from arguments. While I do not think the
addresses out of allowed ranges should realistically happen there due to
the platforms physical address limitations, there is now bus_dma(9) to
make sure of that, preferably via IOMMU.
- Since crc now has the same address range as src/dst, remove crc_dmamap,
reusing dst2_dmamap instead.
Discussed with: cem
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
bz [Fri, 15 Nov 2019 21:55:41 +0000 (21:55 +0000)]
Remove now unused IPv6 macros and update docs.
After r354748-354750 all uses of the IP6_EXTHDR_CHECK() and
IP6_EXTHDR_GET() macros are gone from the kernel. IP6_EXTHDR_GET0()
was unused. Remove the macros and update the documentation.
bz [Fri, 15 Nov 2019 21:51:43 +0000 (21:51 +0000)]
IP6_EXTHDR_CHECK(): remove the last instances
While r354748 removed almost all IP6_EXTHDR_CHECK() calls, these
are not part of the PULLDOWN_TESTS.
Equally convert these IP6_EXTHDR_CHECK()s here to m_pullup() and remove
the extra check and m_pullup() in tcp_input() under isipv6 given
tcp6_input() has done exactly that pullup already.
bz [Fri, 15 Nov 2019 21:44:17 +0000 (21:44 +0000)]
netinet*: replace IP6_EXTHDR_GET()
In a few places we have IP6_EXTHDR_GET() left in upper layer protocols.
The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data
fragment is not contiguous.
Convert these last remaining instances into m_pullup()s instead.
In CARP, for example, we will a few lines later call m_pullup() anyway,
the IPsec code coming from OpenBSD would otherwise have done the m_pullup()
and are copying the data a bit later anyway, so pulling it in seems no
better or worse.
Note: this leaves very few m_pulldown() cases behind in the tree and we
might want to consider removing them as well to make mbuf management
easier again on a path to variable size mbufs, especially given
m_pulldown() still has an issue not re-checking M_WRITEABLE().
bz [Fri, 15 Nov 2019 21:40:40 +0000 (21:40 +0000)]
netinet6: Remove PULLDOWN_TESTs.
Remove the KAME introduced PULLDOWN_TESTs which did not even
have a compile-time option in sys/conf to turn them on for a
custom kernel build. They made the code a lot harder to read
or more complicated in a few cases.
Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code.
Rather than throwing the packet away if it would not fit the
KAME mbuf expectations, convert the macros to m_pullup() calls.
Do not do any extra manual conditional checks upfront as to
whether the m_len would suffice (*), simply let m_pullup() do
its work (incl. an early check).
Remove extra m_pullup() calls where earlier in the function or
the only caller has already done the pullup.
bz [Fri, 15 Nov 2019 21:19:06 +0000 (21:19 +0000)]
Allow per-file lex and yacc options.
In order to allow software with multiple (different) options
for lex and yacc add extra per-file options to the calls.
This is especially useful when one .l file needs -Pprefix.
jhb [Fri, 15 Nov 2019 18:42:13 +0000 (18:42 +0000)]
Add a sv_copyout_auxargs() hook in sysentvec.
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
arichardson [Fri, 15 Nov 2019 18:34:36 +0000 (18:34 +0000)]
Fix build race in bsd.files.mk
We need to ensure that installdirs-FOO runs before installfiles-FOO since
otherwise the directory may not exist when we attempt to install the target.
This was randomly causing failures in our Jenkins instance when installing
drti.o in cddl/lib/drti.
arichardson [Fri, 15 Nov 2019 16:43:36 +0000 (16:43 +0000)]
Use __ as the separator for the exported vars in bsd.compiler/linker.mk
By using '__' instead of '.' as the separator we can also support systems
that use dash as /bin/sh (it's the default shell on Ubuntu/Debian). Dash
will unset any environment variables that use a non alphanumeric+undedscore
character and therefore submakes will fail to import the COMPILER_*
variables if we use '.' as the separator.
dim [Fri, 15 Nov 2019 06:56:25 +0000 (06:56 +0000)]
Merge commit 5bbb604bb from llvm git (by Craig Topper):
[InstCombine] Disable some portions of foldGEPICmp for GEPs that
return a vector of pointers. Fix other portions.
llvm-svn: 370114
This should fix instances of 'Assertion failed: (isa<X>(Val) &&
"cast<Ty>() argument of incompatible type!"), function cast, file
/usr/src/contrib/llvm/include/llvm/Support/Casting.h, line 255', when
building openjdk8 for aarch64 and armv7.
jhibbits [Fri, 15 Nov 2019 04:33:07 +0000 (04:33 +0000)]
atomic: Add atomic_cmpset_masked to powerpc and use it
Summary:
This is a more optimal way of doing atomic_compset_masked() than the
fallback in sys/_atomic_subword.h. There's also an override for
_atomic_fcmpset_masked_word(), which may or may not be necessary, and is
unused for powerpc.
mhorne [Fri, 15 Nov 2019 03:40:02 +0000 (03:40 +0000)]
RISC-V: Print SBI info at startup
SBI version 0.2 introduces functions for obtaining the details of the
SBI implementation, such as version and implemntation ID. Print this
info at startup when it is available.
mhorne [Fri, 15 Nov 2019 03:34:27 +0000 (03:34 +0000)]
RISC-V: add support for SBI spec v0.2
The Supervisor Binary Interface (SBI) specification v0.2 is a backwards
incompatible update to the SBI call interface for kernels running in
supervisor mode. The goal of this update was to make it easier for new
and optional functionality to be added to the SBI.
SBI functions are now called by passing an "extension ID" and a
"function ID" which are passed in a7 and a6 respectively. SBI calls
will also return an error and value in the following struct:
struct sbi_ret {
long error;
long value;
}
This version introduces several new functions under the "base"
extension. It is expected that all SBI implementations >= 0.2 will
support this base set of functions, as they implement some essential
services such as obtaining the SBI version, CPU implementation info, and
extension probing.
Existing SBI functions have been designated as "legacy". For the time
being they will remain implemented, but it is expected that in the
future their functionality will be duplicated or replaced by new SBI
extensions. Each legacy function has been assigned its own extension ID,
and for now we simply probe and assert for their existence.
Compatibility with legacy SBI implementations (such as BBL) is
maintained by checking the output of sbi_get_spec_version(). This
function is guaranteed to succeed by the new spec, but will return an
error in legacy implementations. We use this as an indicator of whether
or not we can rely on the new SBI base extensions.
For further info on the Supervisor Binary Interface, see:
https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
mhorne [Fri, 15 Nov 2019 03:22:08 +0000 (03:22 +0000)]
RISC-V: pass arg6 in sbi_call
Allow for an additional argument to sbi_call which will be passed in a6.
This is required for SBI spec 0.2 support, as a6 will indicate the SBI
function ID.
While here, introduce some macros to clean up the calls.
mhorne [Fri, 15 Nov 2019 03:18:11 +0000 (03:18 +0000)]
plic: support irq distribution
Our PLIC implementation only enables interrupts on the boot cpu.
Implement plic_bind_intr() so that they can be redistributed near the
end of boot during intr_irq_shuffle().
This also slightly modifies how enable bits are handled in an attempt to
better fit the PIC interface. plic_enable_intr()/plic_disable_intr() are
converted to manage an interrupt source's threshold value, since this
value can be used as to globally enable/disable an irq. All handing of the
per-context enable bits is moved to the new methods plic_setup_intr()
and plic_bind_intr().
mhorne [Fri, 15 Nov 2019 03:15:14 +0000 (03:15 +0000)]
plic: fix context calculation
The RISC-V PLIC (platform level interrupt controller) registers are divided up
by "context", which is purposefully left ambiguous in the PLIC spec. Currently
we assume each CPU number corresponds 1-to-1 with a context number, but that is
not correct. Most existing PLIC implementations (such as SiFive's) have
multiple contexts per-cpu. For example, a single CPU might have a context for
machine mode interrupts and a context for supervisor mode interrupts. To
complicate things further, FreeBSD renumbers the CPUs during boot, but the PLIC
driver still assumes that CPU ID equals the RISC-V hart number, meaning
interrupt enables/claims might be performed for the wrong context registers.
To fix this, we must calculate each CPU's context number during
attachment. This is done by reading the interrupt properties from the
device tree, from which a mapping from context to RISC-V hart to CPU
number can be created.
jpaetzel [Thu, 14 Nov 2019 23:31:20 +0000 (23:31 +0000)]
Add the pvscsi driver to the tree.
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
jhibbits [Thu, 14 Nov 2019 21:58:40 +0000 (21:58 +0000)]
Boot arm64 kernel using booti command from U-boot.
Summary:
Boot arm64 kernel using booti command from U-boot. booti can relocate initrd
image into higher ram addresses, therefore align the initrd load address to 1GiB
and create VA = PA map for it. Create L2 pagetable entries to copy the initrd
image into KVA.
(parts of the code in https://reviews.freebsd.org/D13861 was referred and used
as appropriate)
Submitted by: Siddharth Tuli <siddharthtuli_gmail.com>
Reviewed by: manu
Sponsored by: Juniper Networks, Inc
Differential Revision: https://reviews.freebsd.org/D22255
kevans [Thu, 14 Nov 2019 18:38:56 +0000 (18:38 +0000)]
arm64: busdma_bounce: fix BUS_DMA_ALLOCNOW for non-paged aligned sizes
For any size that isn't page-aligned, we end up not pre-allocating enough
for a single mapping because we truncate the size instead of rounding up to
make sure the last bit is accounted for, leaving us one page shy of what we
need to fulfill a request.
ian [Thu, 14 Nov 2019 16:46:27 +0000 (16:46 +0000)]
Rewrite arm/stack_machdep.c for EABI; add stack(9) support to arm kernels.
The old stack_machdep.c code was written for the APCS ABI (aka "oldabi").
When we switched to ARM EABI (back in freebsd 10) this file never got
updated, and apparently nobody noticed that until now.
The new implementation uses the same stack unwinder code used by the
arm implemenation of the db_trace stuff.
bdragon [Thu, 14 Nov 2019 04:34:17 +0000 (04:34 +0000)]
powerpc: Kernel fixes for ppc32 and powerpcspe w/ lld
Fix wrong section ordering that was causing a ".got is not contiguous with
other relro sections" lld error. This also brings ldscript.powerpc and
ldscript.powerpcspe closer to ldscript.powerpc64.
Also, remove unnecessary text relocs from the ppc32 AIM trap code.
kib [Wed, 13 Nov 2019 22:39:46 +0000 (22:39 +0000)]
amd64: only set PCB_FULL_IRET pcb flag when #gp or similar exception comes
from usermode.
If CPU supports RDFSBASE, the flag also means that userspace fsbase
and gsbase are already written into pcb, which might be not true when
we handle #gp from kernel.
The offender is rdmsr_safe(), and the visible result is corrupted
userspace TLS base.
Reported by: pstef
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
jhb [Wed, 13 Nov 2019 21:49:46 +0000 (21:49 +0000)]
Refine r354661 to unbreak the GCC_BOOTSTRAP case.
MK_CLANG_IS_CC controls installing links for GCC, not just clang. Set
MK_CLANG_IS_CC to the value of MK_CLANG_BOOTSTRAP. This will leave it
as "no" if no bootstrap compiler is being built or GCC 4.2.1 is being
used as the bootstrap compiler, and "yes" if clang is being used as
the bootstrap compiler.
Submitted by: bdrewery (kind of, he suggested this on IRC while I was
testing the original patch)
Reviewed by: kevans, imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22350
trasz [Wed, 13 Nov 2019 20:27:38 +0000 (20:27 +0000)]
Add 'linux_mounts_enable' rc.conf(5) variable, to make it possible
to disable mounting Linux-specific filesystems under /compat/linux
when 'linux_enable' is set to YES.
Reviewed by: netchild, ian (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22320
kevans [Wed, 13 Nov 2019 18:21:06 +0000 (18:21 +0000)]
ssp: further refine the conditional used for constructor priority
__has_attribute(__constructor__) is a better test for clang than
defined(__clang__). Switch to it instead.
While we're already here and touching it, pfg@ nailed down when GCC actually
introduced the priority argument -- 4.3. Use that instead of our
hammer-guess of GCC >= 5 for the sake of correctness.
dougm [Wed, 13 Nov 2019 15:56:07 +0000 (15:56 +0000)]
Define wrapper functions vm_map_entry_{succ,pred} to act as wrappers
around entry->{next,prev} when those are used for ordered list
traversal, and use those wrapper functions everywhere. Where the next
field is used for maintaining a stack of deferred operations, #define
defer_next to make that different usage clearer, and then use the
'right' pointer instead of 'next' for that purpose.
Approved by: markj
Tested by: pho (as part of a larger patch)
Differential Revision: https://reviews.freebsd.org/D22347
bz [Wed, 13 Nov 2019 12:05:48 +0000 (12:05 +0000)]
nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c
Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static. Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.
Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.
bz [Wed, 13 Nov 2019 11:21:02 +0000 (11:21 +0000)]
lltabl: remove dead code
Remove the long (8? years ago) #if 0 marked function lltable_drain() and
while here also remove the unused function llentry_alloc() which has call
paths tools keep finding and are never used.